It sounds easy but it's not: imagine you're the software controling the SSD. Before you do write you don't know if the write will succeed. Once you've written, exactly the information that gives the pointers to the valid data could be the one that is destroyed. Then the "raw" data you can access can be "out of order" but that would still be better than nothing. I can imagine that there would have to be some special "recovery mode" which would allow ythe user to rescue the data blocks even with the uncertain order, in case you are willing to piece some of them together. But almost certainly there aren't much people willing to pay for that.
"Read only" after some fixed number of writes would be safer. But then the complaints would be "why can't they just allow me to write as much as I can, I have the backup somewhere anyway." Which is also a valid wish. So it would be the best to be able if user could select the mode.
And what about the drive that went into read-only mode, and then on reboot bricked itself? By design?
I can easily see someone having their computer lock up / etc, restarting, and losing their data. As restarting is often one of the first debugging steps for so many things.
You know, the drives have their own CPU and RAM, and execute their own software. The software keeps the copy of some table needed to do the proper reads in RAM and tries to update it to the flash as the response of the normal data write. The flash fails and the software gets the notification about that. The software can provide the reads from the info in RAM as long as the info remains in RAM so it enters the read-only mode. After the reboot there are only the bad bits on the flash, the RAM content is fully wiped away by the reboot.
So have a small area of flash that's only used to dump the RAM to at EOL. I, for one, would much rather have a little bit less space available, with a better chance of not bricking at EOL, than the current situation.
Looks like a good idea to me, the question is how small the area would be. I think I've read somewhere that writing that data takes some 40 seconds on some Samsung given the write speeds around 400 MB/s which is approx 16 GB reserved for the feature of user being able to read the failed drive which notified the user long before that the declared number of counts was already spent. Somebody who actually has better (industry insider) info is welcome to correct me.
As restarting is often one of the first debugging steps for so many things.
If you're the kind of person who can troubleshoot a faulty drive, then you're not the kind of person who should reboot as a first step towards debugging. Especially if you suspect a faulty drive.
I suspect that most people, when a computer freezes, don't instantaneously suspect a faulty drive. At least not before they try other things, a large chunk of which require reboots.
>It sounds easy but it's not: before you write, you don't know if the write will succeed. Once you've written, exactly the information that gives the pointers to the valid data could be the one that is destroyed.
How is that different from any other write the drive had experienced though?
Every single other previous write had a way of retrying by way of spare flash cells. That last write is fundamentally different, because there are none left.
Intel died after power cycle because of buggy Sandforce chip/firmware. Intel bullshit fairytale was post facto rationalization. There is NO UNIVERSE where bricking user product full of data is a good thing, especially if you promised read only failure mode in the documentation.
Everything else should still be fine though. And the flash that holds the pointers (redirection table for wear leveling) can be a slightly higher quality flash that lasts just a little bit longer.
When the SSD is bricked to read it's the redirection table that is corrupted. The corruption means that the random bits are wrong. Once the random bits are wrong, the redirection is wrong. If the redirection table is in the separate flash, you have to spend more flash for redirection table than for data, as that table will be heavily stressed (the redirections have to be often updated with the writes if you want good wear levelling). If you offload the part of the redirection table to the "normal flash" you're still at the beginning. Some random bits will be wrong after some write and you depend exactly on them to be able to read properly.
And as for power outages and hard resets, there's these nifty things known as capacitors. It only needs to last as long as is necessary to write the data to flash.
A better option would be to spend lots of time repetitively writing down the pointers in newer and newer iterations every N blocks, so that of fallback available if one block holding pointers fails.
Considering spinning rust used to fail catastrophically, sometimes very quickly, I'm really not bothered.
Plan for the worst, hope for the best.
In fact my personal advice is to plan to lose everything since your last backup. If you can't afford that risk, back up more often.
No disk technology is perfect nor instantly comparable. I consider this to be purely a 'value for money before it blows up' assessment. The results make me happy as I've been recommending and use 840 pros for a while now.
...and which magical hard drive do you buy that guarantees it will fail gracefully? Customer hard drive reviews are littered with sudden drive deaths and DOAs. If the electronics or motor go on a hard drive, the data is practically toast unless you're ready to pay out the nose for exotic data recovery.
This is why i really wish that some kind of "floppy" had kept up with HDD capacities. There the drive electronics etc was by design separated from the storage media. As such, having the control board or similar keel over meant getting a new drive.
Isn't MicroSD what you want? Even when hard discs were tiny they were bigger than any floppies. Zip drives were possibly the closest, but they were only used because they were portable, not because they were any good.
Zip drives could also go bad and screw up your data if the mechanism failed. Stupid click of death... a completely separate storage layer can't always save you.
SDs still have a level of electronics between the storage IC and the rest of the system. There may have been a solid state format like what i'm thinking about, but i think it is long defunct.
There was/is a packet writing format that can be used, as far back as CD-R i think, but early Windows needed a special driver just to read them. Supposedly Universal Disk Format (UDF, DVD and later) has it baked in. But i can't say i have ever seen it used.
It is kind of like when you have a power loss while writing to a file system (like say FAT32 or EXT2) and when it comes back up you have to do a fsck or equivalent. If something gets messed up it may not come back in an consistent manner. Something similar happens at and SSD level. One of the most complex aspects of an SSD to implement and test.
Not sure, but I'm imagining it's to avoid someone throwing out a read-only drive filled with sensitive data (and unable to erase it through conventional means - that is, writing to the drive)
Of course, there are other solutions to this problem: sledgehammer, etc
A bricked device still has the data on it. It's fake security.
And you could allow secure erase while still prohibiting normal writes. The cells aren't actually gone or unusable after all, just unreliable. A secure erase would work fine on them.
Exactly, SSD's sudden death is the major reason that I dare not use it for any serious purposes at the moment, good backup scheme does help, but a sudden death is still too bitter to taste in reality.
This is why, for anything I care about, I have them in RAID1 or similar. I have a home server with a pair of SSDs for the system volumes (and hosting a few VMs: mail server, web/app servers, play things, ...) and a pile of spinning metal in RAID5 (I'm considering moving to 6 because of the "new error during rebuild as rebuild takes ages on large drives" issue) for media storage.
One of the SSDs died a couple of weeks ago (unexpectedly Linux didn't fail safe properly, it fell over trying to read from the broken one and didn't seem to try read from the other instead, upon coming back up it ran from the good drive OK) and RAID saved my much hassle rebuilding from the previous night's backups.
The SSDs in my desktop box aren't RAIDed, nor in my laptop (two wouldn't fit), but I can easily survive either of those being out of action for a couple of days and may backup regime involving copies on the local file server and on remote hosts (fingers crossed) should mean nothing gets lost if they do die.
I've not found the failer rates of SSDs to be any different to spinning media - I don't trust important data to a single drive of either type.
I am NOT impressed with the behavior at end of life. In fact it makes it wary of using any SSD.
A bad SSD should go read only, and stay that way. Not self-brick, and make the data unrecoverable.