I'm impressed with how long they lasted. I am NOT impressed with the behavior at...

acqq · on March 12, 2015

It sounds easy but it's not: imagine you're the software controling the SSD. Before you do write you don't know if the write will succeed. Once you've written, exactly the information that gives the pointers to the valid data could be the one that is destroyed. Then the "raw" data you can access can be "out of order" but that would still be better than nothing. I can imagine that there would have to be some special "recovery mode" which would allow ythe user to rescue the data blocks even with the uncertain order, in case you are willing to piece some of them together. But almost certainly there aren't much people willing to pay for that.

"Read only" after some fixed number of writes would be safer. But then the complaints would be "why can't they just allow me to write as much as I can, I have the backup somewhere anyway." Which is also a valid wish. So it would be the best to be able if user could select the mode.

TheLoneWolfling · on March 12, 2015

And what about the drive that went into read-only mode, and then on reboot bricked itself? By design?

I can easily see someone having their computer lock up / etc, restarting, and losing their data. As restarting is often one of the first debugging steps for so many things.

acqq · on March 12, 2015

You know, the drives have their own CPU and RAM, and execute their own software. The software keeps the copy of some table needed to do the proper reads in RAM and tries to update it to the flash as the response of the normal data write. The flash fails and the software gets the notification about that. The software can provide the reads from the info in RAM as long as the info remains in RAM so it enters the read-only mode. After the reboot there are only the bad bits on the flash, the RAM content is fully wiped away by the reboot.

pXMzR2A · on March 13, 2015

The person you are replying to is talking about a "defective by design" problem they want solved. You are explaining how this defect works.

TheLoneWolfling · on March 13, 2015

So have a small area of flash that's only used to dump the RAM to at EOL. I, for one, would much rather have a little bit less space available, with a better chance of not bricking at EOL, than the current situation.

acqq · on March 13, 2015

Looks like a good idea to me, the question is how small the area would be. I think I've read somewhere that writing that data takes some 40 seconds on some Samsung given the write speeds around 400 MB/s which is approx 16 GB reserved for the feature of user being able to read the failed drive which notified the user long before that the declared number of counts was already spent. Somebody who actually has better (industry insider) info is welcome to correct me.

femto · on March 13, 2015

Out of curiosity, I looked up SSD data recovery, and here is what one of the first Google hits had to say about it, in a prominent location.

"If your SSD has stopped working and you cannot access the data, the best thing to do is to turn it off to prevent further damage." [1]

Maybe that advice needs to be reconsidered?

[1] http://www.payam.com.au/content_common/pg-ssd-data-recovery....

vacri · on March 13, 2015

As restarting is often one of the first debugging steps for so many things.

If you're the kind of person who can troubleshoot a faulty drive, then you're not the kind of person who should reboot as a first step towards debugging. Especially if you suspect a faulty drive.

TheLoneWolfling · on March 13, 2015

I suspect that most people, when a computer freezes, don't instantaneously suspect a faulty drive. At least not before they try other things, a large chunk of which require reboots.

jsprogrammer · on March 12, 2015

>It sounds easy but it's not: before you write, you don't know if the write will succeed. Once you've written, exactly the information that gives the pointers to the valid data could be the one that is destroyed.

How is that different from any other write the drive had experienced though?

DrJosiah · on March 13, 2015

Every single other previous write had a way of retrying by way of spare flash cells. That last write is fundamentally different, because there are none left.

jsprogrammer · on March 13, 2015

So you keep some in reserve and go read-only when only x remain. Why auto-brick on next power cycle?

rasz_pl · on March 13, 2015

Its a solved problem (compare and swap etc)

Intel died after power cycle because of buggy Sandforce chip/firmware. Intel bullshit fairytale was post facto rationalization. There is NO UNIVERSE where bricking user product full of data is a good thing, especially if you promised read only failure mode in the documentation.

ars · on March 12, 2015

Everything else should still be fine though. And the flash that holds the pointers (redirection table for wear leveling) can be a slightly higher quality flash that lasts just a little bit longer.

acqq · on March 12, 2015

When the SSD is bricked to read it's the redirection table that is corrupted. The corruption means that the random bits are wrong. Once the random bits are wrong, the redirection is wrong. If the redirection table is in the separate flash, you have to spend more flash for redirection table than for data, as that table will be heavily stressed (the redirections have to be often updated with the writes if you want good wear levelling). If you offload the part of the redirection table to the "normal flash" you're still at the beginning. Some random bits will be wrong after some write and you depend exactly on them to be able to read properly.

ars · on March 13, 2015

I would put the entire redirection table on ram, and only write it to disk (to a dedicated area) when turning off the power.

By doing that, that area will easily outlast the regular flash.

schwap · on March 13, 2015

Hope you have ECC RAM :)

Also hope you never have a power outage or hard reset of any kind.

TheLoneWolfling · on March 13, 2015

ECC RAM? Easy.

And as for power outages and hard resets, there's these nifty things known as capacitors. It only needs to last as long as is necessary to write the data to flash.

mapt · on March 13, 2015

A better option would be to spend lots of time repetitively writing down the pointers in newer and newer iterations every N blocks, so that of fallback available if one block holding pointers fails.

cssmoo · on March 12, 2015

Considering spinning rust used to fail catastrophically, sometimes very quickly, I'm really not bothered.

Plan for the worst, hope for the best.

In fact my personal advice is to plan to lose everything since your last backup. If you can't afford that risk, back up more often.

No disk technology is perfect nor instantly comparable. I consider this to be purely a 'value for money before it blows up' assessment. The results make me happy as I've been recommending and use 840 pros for a while now.

ars · on March 13, 2015

> Considering spinning rust used to fail catastrophically

That's one of the things I look for in a hard disk - how does it act when it fails. A "good" failure is some bad sectors, but otherwise OK.

Klinky · on March 13, 2015

...and which magical hard drive do you buy that guarantees it will fail gracefully? Customer hard drive reviews are littered with sudden drive deaths and DOAs. If the electronics or motor go on a hard drive, the data is practically toast unless you're ready to pay out the nose for exotic data recovery.

cssmoo · on March 13, 2015

This. ANY error and it goes in the bin regardless of tech.

digi_owl · on March 12, 2015

This is why i really wish that some kind of "floppy" had kept up with HDD capacities. There the drive electronics etc was by design separated from the storage media. As such, having the control board or similar keel over meant getting a new drive.

DanBC · on March 12, 2015

Isn't MicroSD what you want? Even when hard discs were tiny they were bigger than any floppies. Zip drives were possibly the closest, but they were only used because they were portable, not because they were any good.

majormajor · on March 13, 2015

Zip drives could also go bad and screw up your data if the mechanism failed. Stupid click of death... a completely separate storage layer can't always save you.

digi_owl · on March 13, 2015

Well the problem there was that, iirc, they had the RW mechanics alongside the platter. Thus they got bumped around during transport.

Without that the damage would have been piecemeal as sectors went bad, rather than watching a chip go poof and lose access to everything.

digi_owl · on March 12, 2015

SDs still have a level of electronics between the storage IC and the rest of the system. There may have been a solid state format like what i'm thinking about, but i think it is long defunct.

TylerE · on March 13, 2015

What's wrong with BD-R?

digi_owl · on March 13, 2015

Can't really be used in a "drag and drop" fashion.

voltagex_ · on March 13, 2015

Wasn't DVD-RAM meant to work for that? Did anyone ever use it?

digi_owl · on March 13, 2015

There was/is a packet writing format that can be used, as far back as CD-R i think, but early Windows needed a special driver just to read them. Supposedly Universal Disk Format (UDF, DVD and later) has it baked in. But i can't say i have ever seen it used.

wnissen · on March 12, 2015

Is there any explanation for the bricking on power cycle? The reads should be fine forever.

pkaye · on March 13, 2015

It is kind of like when you have a power loss while writing to a file system (like say FAT32 or EXT2) and when it comes back up you have to do a fsck or equivalent. If something gets messed up it may not come back in an consistent manner. Something similar happens at and SSD level. One of the most complex aspects of an SSD to implement and test.

raverbashing · on March 12, 2015

Not sure, but I'm imagining it's to avoid someone throwing out a read-only drive filled with sensitive data (and unable to erase it through conventional means - that is, writing to the drive)

Of course, there are other solutions to this problem: sledgehammer, etc

ars · on March 12, 2015

A bricked device still has the data on it. It's fake security.

And you could allow secure erase while still prohibiting normal writes. The cells aren't actually gone or unusable after all, just unreliable. A secure erase would work fine on them.

raverbashing · on March 13, 2015

"A bricked device still has the data on it. It's fake security"

No, because they do this http://vxlabs.com/2012/12/22/ssds-with-usable-built-in-hardw...

Erase the AES key -> data is gone

thrownaway2424 · on March 13, 2015

And, it would be extremely easy to implement, because you only have to assert erase on every page in the device, which can be done in parallel.

ausjke · on March 13, 2015

Exactly, SSD's sudden death is the major reason that I dare not use it for any serious purposes at the moment, good backup scheme does help, but a sudden death is still too bitter to taste in reality.

dspillett · on March 13, 2015

This is why, for anything I care about, I have them in RAID1 or similar. I have a home server with a pair of SSDs for the system volumes (and hosting a few VMs: mail server, web/app servers, play things, ...) and a pile of spinning metal in RAID5 (I'm considering moving to 6 because of the "new error during rebuild as rebuild takes ages on large drives" issue) for media storage.

One of the SSDs died a couple of weeks ago (unexpectedly Linux didn't fail safe properly, it fell over trying to read from the broken one and didn't seem to try read from the other instead, upon coming back up it ran from the good drive OK) and RAID saved my much hassle rebuilding from the previous night's backups.

The SSDs in my desktop box aren't RAIDed, nor in my laptop (two wouldn't fit), but I can easily survive either of those being out of action for a couple of days and may backup regime involving copies on the local file server and on remote hosts (fingers crossed) should mean nothing gets lost if they do die.

I've not found the failer rates of SSDs to be any different to spinning media - I don't trust important data to a single drive of either type.