Analysis of SSD Reliability during power outages

JohnTHaller · on Dec 27, 2013

The author of this study submitted it to Slashdot with the following summary: "After the reports on SSD reliability and after experiencing a costly 50% failure rate on over 200 remote-deployed OCZ Vertex SSDs, a degree of paranoia set in where I work. I was asked to carry out SSD analysis with some very specific criteria: budget below £100, size greater than 16Gbytes and Power-loss protection mandatory. This was almost an impossible task: after months of searching the shortlist was very short indeed. There was only one drive that survived the torturing: the Intel S3500. After more than 6,500 power-cycles over several days of heavy sustained random writes, not a single byte of data was lost. Crucial M4: failed. Toshiba THNSNH060GCS: failed. Innodisk 3MP SATA Slim: failed. OCZ: failed hard. Only the end-of-lifed Intel 320 and its newer replacement, the S3500, survived unscathed. The conclusion: if you care about data even when power could be unreliable, only buy Intel SSDs."

Source: http://hardware.slashdot.org/story/13/12/27/208249/power-los...

colechristensen · on Dec 28, 2013

Did he do any control studies with spinning disks?

I have my doubts that many of them would survive thousands of power cycles per day.

nexox · on Dec 28, 2013

I've seen a few different models of 2.5" Enterprise SAS drives go through a similar test (as a control,) and they were perfectly fine with no corruption after around 2 weeks of reboots at approximately 10 minute intervals.

It's the SSDs that I worry about - some store their own firmware on the same NAND as the User's data, and allow it to become corrupted during power-loss events. Several cheap models won't last more than a couple days before they simply drop off the SATA bus and never come back.

AmVess · on Dec 28, 2013

There's probably not much of anything that would survive that many power cycles in a day.

The test is pretty pointless.

ytjohn · on Dec 28, 2013

From what they said about these reading sensor data, I'm picturing a factory, industrial, or field type scenario. I'm picturing a very small embedded computer with some sensors attached and an internal ssd. Referencing a 1.5gb OS like they did makes me think they just have a normal OS installed, but it's entirely possible that they have a modified kernel and fast boot process.

Essentially, it sounds like they have a single unit that gets turned on, gathers sensor readings, and then get turned off. I'm guessing these are not handheld devices, otherwise they most likely would have a battery attached. If a non 24/7 factory, these could be turned off with the rest of the assembly line by someone throwing breakers off on the way out (very common).

rlpb · on Dec 28, 2013

What if I have thousands of disks deployed, and I suffer a single power loss? What are the chances that at least one disk will contain corrupt data after such an event?

Instead of buying thousands of disks to test this scenario, one reasonable shortcut might be to repeatedly test a single disk.

Even a spinning disk can handle a few thousand power cycles in its lifetime, surely?

AmVess · on Dec 28, 2013

On power up, there's a pretty big spike in power usage as the controller gets its shit together. Normally, this is not an issue, since the SSD housing simply absorbs the heat and then goes about its business. The same holds true for spindle drives.

What this guy did was test the drives in a manner they weren't designed for. Sure, I can drop Corvettes off a 20 story building, then bitch about the results, but that wouldn't change the fact that my test was flawed from the onset.

All he did was subject something to an environment is wasn't designed for.

Sure, I can drop a Corvette from a 20 story building, but there's nothing to be gained when the crumple zone is packed into the tail lights.

zAy0LfpBZLC8mAC · on Dec 28, 2013

Hu? Where should that spike in power usage come from?

rational_indian · on Dec 28, 2013

It comes from the decoupling/filter capacitors used in the DC power circuit. When the power is turned on after a sufficiently long time these capacitors are all uncharged and appear as "shorts" to the power source thus drawing large amounts of current. This initial surge current drops off as the capacitors get charged.

zAy0LfpBZLC8mAC · on Dec 28, 2013

Filter capacitors producing excessive heat due to inrush current? That would be bad filter capacitors indeed ;-)

The claim was that an SSD somehow converted more electric energy into heat immediately after power-up which would damage the SSD, so real consumption, not just a current peak that goes into storage for later consumption. Normal-ESR electrolytics might have a heat problem when used at a few kHz in switching applications, but certainly not at 0.1 Hz.

CamperBob2 · on Dec 28, 2013

It's long been standard practice with tantalum filter capacitors to feed them through an inductor or at least a resistor to prevent inrush current failures. That, and/or you derate the crap out of them when you design the board. Newer drives are probably using multilayer ceramics that can put up with just about any abuse including inrush.

Executive summary: powerup stress is not an issue unless the drive was designed by a moron.

ak217 · on Dec 28, 2013

It's not pointless, just misrepresented. SSDs enable new classes of applications for high-performance, non-power-protected computers. With an SSD and Intel ULV CPU, I can put together a cheap high-performance computer with no moving parts and no batteries. That has the potential to be very robust and have an expected lifetime of decades, but only if the SSD is rock solid.

cwzwarich · on Dec 28, 2013

Which SSD is going to last for decades in a strenuous environment?

ak217 · on Dec 28, 2013

Intel, apparently

RyanZAG · on Dec 28, 2013

This test really doesn't prove that at all. There are vastly different factors between power cycling in a single day and long term continuous use. You can't take results from one and get any meaningful information about the other.

ak217 · on Dec 28, 2013

You're right. But it does show one way in which it's resilient. Given this information and what else I know about Intel SSDs, I would expect them to be the most resilient overall.

strlen · on Dec 28, 2013

What I find rather annoying is that manufacturers/marketers rarely state whether or not an SSD has a supercapacitor or not. Here have been my finding:

- OCZ Vertex 4 SSD does (but this is not advertised)

- OCZ Vertex 3 does not, Vertex 3 Pro does

- Ditto for Vertex 2 / Vertex 2 Pro

- OCZ Deneva(2) R does, other OCZ Deneva/Deneva 2 do not -- and in the case of Deneva 2 R this is advertized

- Intel 320 does (but is only sata II), but this is not advertised at all

- Intel 520/530 does not

- Intel 330/335: unclear

- Intel S3700/S3500: does and this is advertised

Background: I was looking for an SSD to use for a ZFS intent log ("ZIL" -- ZFS's write ahead log) -- my requirements were a sandforce controller (or equivalent) and toggle NAND (so that I could use the same disk for both ZFS cache and the ZIL), and a supercap. This was surprisingly hard to find.

What I'd like to see is:

1) Data on which SSDs have supercap

2) Data on which SSDs actually honour cache flush request (then a UPS + forcing a cache flush request upon power failure + redundant power supplies/multiple replicas in a distributed system) would suffice.

3) Best yet: have an API for for check whether or not an SSD has a supercap, whether or not the supercap still holds charge, and policy for honouring cache flush requests. Let the OS decide based on a policy I set.

If you are a consumer, the practical recommendation is not very different from the practical recommendation I'd give to anyone using spinning disks: RAID1/1+0/RAIDZ(-2) (with SSDs coming from different patches so that they do not wear out at the same time), UPS (for power outages), backups (against yourself and against power supply failures).

For production: obviously use a UPS, put the WAL of your database on an SSD with a supercap, make sure that your database fsync()'s the WAL at a reasonable interval (on every transaction is probably unreasonable, but so is once an hour), and use a distributed system that replicates the WAL. If using a distributed system with (semi-)synchronous WAL replication is not an option and losing incremental data is not acceptable, use redundant power supplies.

miles · on Dec 28, 2013

Rather than a supercapacitor, the Crucial M500 uses a small array of capacitors:

All of the M500 line includes hardware AES 256-bit encryption, and Micron showed us an array of small capacitors on one of the M.2 form factor drives that supported flushing of all data to the NAND in the event of a power loss--not a super capacitor as seen in enterprise class SSDs, but there's no RAM cache to flush so it's just an extra precaution to ensure all of the data writes complete. http://www.anandtech.com/show/6614/microncrucial-announces-m...

Other features that set the M500 apart also center on optimizations that are clearly holdovers from the enterprise version of the SSD. Power hold-up is provided by a small row of capacitors that will flush all data in transit to the NAND in the event of a power loss. This feature is not standard with any other consumer SSD on the market, and in enterprise SSDs power capacitors typically command a much higher price structure. Finding power loss protection on the consumer M500 is a nice surprise, and one that users will need more often than they think. http://www.hardocp.com/article/2013/05/28/crucial_m500_480gb...

rosser · on Dec 28, 2013

put the WAL of your database on an SSD

If you're talking about PostgreSQL, the general consensus from the mailing lists and IRC discussions seems to be that you should in fact not put your WAL on SSDs. $PGDATA/base, yes; $PGDATA/pg_xlog, no.

The amount of write churn in your WAL will burn through your SSDs wear leveling very quickly, and given that WAL is nearly perfectly sequential in access pattern, you lose (forfeit) nearly all the random IO benefit an SSD buys you.

My production master/slave pair only have SSD in them, so I'm "doing this wrong", given that advice. Our SSDs are SLC NAND, however, so their write endurance is vastly, vastly higher than the drives under consideration in both your comment, and TFA.

JohnBooty · on Dec 28, 2013

  What I'd like to see is:
  1) Data on which SSDs have supercap

Every SSD review I've read at Anandtech.com or StorageReview.com has mentioned the presence of a supercap on drives that have them.

When they review consumer-oriented drives without the supercap, they typically don't say "there's no supercap" but 1) there are usually physical teardown pics so you can fairly easily check for yourself 2) be pretty sure there isn't one if they don't mention it, because they always talk about it if there is one.

This doesn't help with this, of course:

  2) Data on which SSDs actually honour cache flush request (then a UPS + forcing a cache flush request upon power failure + redundant power supplies/multiple replicas in a distributed system) would suffice.

Would love to see reviewers test that as well. For now it seems like there's no way to know unless you're willing to do your own power-cycle torture testing.

CamperBob2 · on Dec 28, 2013

I guess I don't understand why redundant power supplies aren't the answer in all of these cases. I'd rather nothing goes down in the event of a power failure, including the rest of the server. Even the low-end Synology RAID I use at home has two power supplies which I could (but don't bother to) plug into separate UPSes.

AnthonyMouse · on Dec 28, 2013

> If using a distributed system with (semi-)synchronous WAL replication is not an option and losing incremental data is not acceptable, use redundant power supplies.

Also be sure to plug each redundant power supply into a separate UPS, and avoid equipment that has redundant power supplies on a single feed. UPS failures are a bear.

nexox · on Dec 28, 2013

And also make sure no UPS is ever loaded over ~40%, so when one of the pair fails the other can handle the full load. And always set them up in isolated pairs, with no equipment plugged into a pair member and a non-member, to prevent cascading failures.

ksec · on Dec 28, 2013

Um, Does anyone know if Kingston SSDNow KC300 has any? Because Limestone Network are currently using them.

I have always wondered why Hosting companies aren't choosing Intel DC Series at all. As far as i know, one of the few hosting company using it are Hivelocity.

CamperBob2 · on Dec 28, 2013

It's probably more economical for hosting companies to use lots of cheap drives in a RAID configuration. For my own desktop PC, I'd rather use a single SSD with a good reputation for reliability, together with frequent backups.

Of course, a RAID is only useful when the failure is drastic enough for the controller to notice it, which some of these power-down failures may not be.

nwh · on Dec 28, 2013

> Best yet: have an API for for check whether or not an SSD has a supercap, whether or not the supercap still holds charge, and policy for honouring cache flush requests. Let the OS decide based on a policy I set.

I'm sure there's some SMART flag that shows it. No idea which though, not one of them is properly documented.

Mithaldu · on Dec 28, 2013

Terrible title, terrible science, terrible conclusion.

Spoiler: He only tested these five drives, only Intel survived, so if they are your candidates, apply his conclusion:

    Crucial M4
    Toshiba THNSNH060GCS 60gb
    Innodisk 3MP Sata Slim
    OCZ Vertex 32gb
    Intel 320 and S3500

Notably missing is the Intel runner-up Samsung and probably others i'm not aware of, as well as other models.

jbri · on Dec 28, 2013

The author mentions that he only evaluated drives that have some form of power loss protection - doing some quick searching around, I couldn't find any Samsung drives in the given price range that claimed to have that.

Did I miss an appropriate Samsung drive in my quick searches? Or is there reason to suspect that a Samsung drive that doesn't claim to have power loss protection would nonetheless handle this case better than the non-Intel drives that did make that claim? Because if not, then I don't think not evaluating Samsung drives compromises the results in any way.

vegardx · on Dec 28, 2013

The Samsung SM843T has power loss protection, among other things, and is priced really competitively to the Intel S3500.

randall · on Dec 28, 2013

A little harsh, don't you think? I'd say it's still useful to have 5 data points than 0.

Mithaldu · on Dec 28, 2013

I'd have been gentle if his conclusion had included a measure of humbleness and at least kind of approached being somewhat accurate. :)

samplonius · on Dec 28, 2013

And where was the "terrible science"? Requiring "humbleness" in posts seems to only apply to other people's posts, apparently.

emn13 · on Dec 28, 2013

The conclusion "buy intel" makes no sense. He tested far too few drives to make that conclusion, and many potentially safe alternatives went untested. Not to mention the fact that not all intel drives apparently are safe.

So yeah; it's unfortunately bogus science - he's drawing invalid conclusions from a far too small sample.

vacri · on Dec 28, 2013

Terrible science = small, restricted subject population, from which an ironclad blanket conclusion is made, that covers the entire population.

One of the restrictions is price, which makes little sense in concluding so strongly for a measure involving quality. Not to mention that only a single specimen of a single model of each brand was tested. The top-voted comment in the thread points out some Intel models that don't have supercapacitors in them.

Already the article is updated with a couple of other drives to test. I guess that wasn't "End of discussion" after all...

wyager · on Dec 28, 2013

He explained why he chose those 5. Only those 5 advertised certain features he needed.

JohnBooty · on Dec 28, 2013

The article wasn't written by a guy claiming to do a comprehensive survey of every SSD on the market.

He tested a bunch of equipment, and was kind enough to share his methods and results. Lot of people/companies don't do that.

He's not misrepresenting what he did, and he provided some valuable data, so kudos to him.

rlpb · on Dec 28, 2013

Why must SSDs have "power loss protection" in the form of a battery or supercapacitors to finish writes? Can they not simply cache writes but not falsely claim to the OS that they aren't fully written yet, and do some internal housekeeping (journal-like) for recovery? Does SATA/SCSI support the concept of "sync", or would allowing the disk to fall behind without deceiving the OS kill performance in some way?

gvb · on Dec 28, 2013

Inside the SSD are flash memory chips. It takes a relatively long time (typically 10mSec) to tunnel the charge that changes a '1' bit (the erased state) to a '0' bit. If the power goes away during that programming time, the result in indeterminate. Not just bad, but bad in a potentially unknown way.

The worst case, which I've experienced with direct writing to flash chips, is that a totally unexpected flash location is corrupted. My guess was that the CPU started the write cycle just as power was lost, the CPU glitched the address lines as it was losing its brains, and the flash corrupted a random location.

Very, very bad.

If a power loss causes the flash to scribble on the wrong SSD location (e.g. the tables that keep track of good and bad blocks), the SSD "dies".

userbinator · on Dec 28, 2013

>If a power loss causes the flash to scribble on the wrong SSD location (e.g. the tables that keep track of good and bad blocks), the SSD "dies".

The problem isn't that data is corrupted in the flash, the problem is that the devices' own firmware (the SSD's embedded controller, which is a tiny computer in itself, "boots" from that) is stored in the same memory used for data storage. They could've gotten around this by not storing the firmware on the actual NAND used for data, but a separate device (or kept it inside the controller itself), so any power loss may cause data corruption, but not render the SSD completely unresponsible and inoperable.

zAy0LfpBZLC8mAC · on Dec 28, 2013

That's still no reason why you would need "power loss protection", in the sense of energy storage. What is needed is proper brown-out detection and properly set up reset circuitry so that the write gets aborted before lines start glitching (that is to say: energy needs to be dumped so it can not cause any damage once the CPU starts losing its brains).

That the storage location where a write was in progress is indeterminate afterwards shouldn't matter - between the time that some software initiates a write() and the time an fsync() on the same file returns, there is no guarantee what the written location will contain after a power failure, and if your software relies on the value in any way whatsoever, your software is broken.

gvb · on Dec 28, 2013

Yes there is. The amount of hold up time is specified in the flash manufacturer's data sheets.

The problem is that the flash has an internal state machine that performs the charge tunneling as an iterative process: it tunnels some charge, checks the level on the floating gate, and repeats as necessary. If the power to the flash chip goes away or glitches during this internal programming process, the flash write fails in indeterminate and sometimes very unexpected ways.

zAy0LfpBZLC8mAC · on Dec 28, 2013

Well, yeah, of course, you need some limit on the speed at which the power supply voltage drops, I was talking about the "flushing the cache" kind of "power loss protection", not claiming that pulling a circuit into the reset state could happen without latency ;-)

So, yes, you of course have to have some low pass in the power supply rail to make sure that power drops no faster than you can handle shutting down the circuit in an orderly fashion - all I am saying is that there is no need to guarantee that a read of a region where a yet-unacknowledged write was happening when the power supply failed returns non-random data, so it is perfectly fine to interrupt the programming process and leave cells where user data is stored in an indeterminate state. It's not OK to glitch address lines while programming is still going on, of course :-)

nexox · on Dec 28, 2013

The "super capacitors" (almost nobody uses actual super capacitors after early models discovered that super capacitor lifetime at server temperature was inadequate) are just a low-pass filter - they usually only keep the drive online for a couple dozen milliseconds after main power goes down.

Most reasonable SSDs do not write cache at all, but thanks to the wear-leveling issues, they need to have a sector-mapping table to keep track of where each sector actually lives. That table takes many, many updates, and since it's usually stored in some form of a tree, it's expensive to save to media which does not support directly overwriting data (IE NAND, which requires a relatively long erase operation to become writable.) This table is typically what is lost during power events, and it is not usually written out when you sync a write.

So what happens is you write, sync, get an ack, lose power, reboot, and magically that sync'd data is either corrupt, or, even worse, it's regained the value it had before your last write, with no indication that there is a problem. This can cause some extremely interesting bugs.

brianpgordon · on Dec 28, 2013

10 milliseconds is far too long to wait for a write in application code.

zAy0LfpBZLC8mAC · on Dec 28, 2013

Hu? What kind of applications do you write where 10 ms is too much latency?!

JohnBooty · on Dec 28, 2013

I can see you didn't spend much time with early SSDs back in the bad old days when 2.5" SSDs were unsophisticated and were essentially using the same architecture you'd find inside a $5 SD card from Walgreen's. Those SSDs often performed worse than uncached HDD's when you threw random writes at them.

  Hu? What kind of applications do you write where 10 ms is too much latency?!

10ms per write is huge. Keep in mind that even a tiny 1KB database insert/update will involve multiple writes: the filesystem journal, the database journal, and the data itself. 10ms for each of those steps adds up quickly.

Alternately, consider writing a modestly-sized file to disk, like a 5MB mp3 file. You're spanning multiple flash cells (probably 40 128kb cells at a bare minimum, plus filesystem journaling etc) at that point. Now you're close to half a second of total latency. Oh, you're writing an album's worth of .mp3 files? We're up to five seconds of latency now. But probably more like ten seconds, if your computer is doing anything else whatsoever that involves disk writes in the background.

So yeah, 10ms write latency is no fun.

zAy0LfpBZLC8mAC · on Dec 28, 2013

Why would you want to order the write of every 128 kB chunk of your MP3 album with regard to every other? A rotating HDD also has a latency of around 10 ms, and you certainly can write an album's worth of MP3 files faster than in 10 seconds - unless you insist on flushing every 128 kB chunk to the medium separately for no apparent reason, of course.

That you need multiple serialized writes for a commit is a valid point, but I would think that for most applications even a write transaction latency of 100 ms isn't a problem. Also, if you overwrite contents of an existing file without changing its size, you don't need any FS journaling at all, as the FS only needs to maintain metadata consistency, it's not a file-content transaction layer (details depend on the FS, obviously).

Cheap SD cards aren't slow because they have a 10 ms write latency, but because they have a very low IOP rate, which you also could not change by adding a cache and a buffer capacitor, but only by parallelizing writes to flash cells.

baruch · on Dec 28, 2013

One main reason is that the SSDs do not store the data where you ask it for, it maps that location to any other location that it sees fit. That means it has an address-to-physical-location mapping that it maintains. If the SSD loses that mapping you data is gone. You simply cannot find the order of things.

There are some recovery strategies employed by the SSD firmware but they can't handle all possible scenarios and you are likely to lose some data in case of an unprotected power failure.

Also, as mentioned before the write takes time and not finishing it on time will make the location being written to have unreadable data.

userbinator · on Dec 28, 2013

The main issue is the firmware being stored in the same flash memory the rest of the data is, so it's subject to the same corruption. "bit rot" in NAND flash is real - the stored charges will gradually leak out, and especially with the shrinking of each bit cell and MLC, even data that's sitting there dormant will gradually erase itself so the firmware has to periodically rewrite these blocks, at the same time moving them around to do wear leveling (another thing that has been sacrificed in the higher density shrinkage is write endurance.)

Corrupting data or BMTs if powered off in the middle of a write is not such a bad thing compared to if that data happens to be the drive's firmware. In the former case at worst you lose a superblock and the OS doesn't boot, but you can still recover from that fairly easily compared to the latter case, where the drive can become no longer responsive.

baruch · on Dec 28, 2013

It's unlikely that the firmware region will be corrupted unless you are in the middle of a firmware upgrade. While with the variety of SSD and HDD failures I've seenis large and such a failure is not impossible it's one I've never seen and sounds rather improbable.

stcredzero · on Dec 28, 2013

I would like to see a hybrid drive system where writes are first made to a journal which exists on a spinning platter, maintained in such a way that seek times are absolutely minimized. Then, the data is transferred to the flash side of the house.

This would result in a drive with fantastic throughput, except when immediately reading what was written. So long as there was a delay between writes and reads of the same data, it would look like a perfectly performing SSD. This could have fantastic properties for web applications. Given network latencies, many web applications could have considerable delays between the time data is written to disk and subsequently read.

JohnBooty · on Dec 28, 2013

What you're describing is essentially how hybrid drives like the Momentus XT, as well as Apple's "Fusion Drive" software solution already work -- writes go to the HDD and are then pulled into the SSD for fast reads.

Also, I believe you could also achieve this at a software level with ZFS, which I understand allows you to do things like allocate an entire SSD (or multiple SSDs) as caches for other (presumably slower) drives.

stcredzero · on Dec 28, 2013

writes go to the HDD and are then pulled into the SSD for fast reads.

With journalling?

Also, I believe you could also achieve this at a software level with ZFS, which I understand allows you to do things like allocate an entire SSD (or multiple SSDs) as caches for other (presumably slower) drives.

Careful reading, please. Journalling combined with the high throughput of spinning platters if you can create situations that eliminate seek and rotational latency is the key point here, not caching.

JohnBooty · on Dec 31, 2013

I guess I had trouble understanding your idea because it's contrary to one of the few principles in computer architecture that nobody has ever argued about.

You want to stick a spinning rust platter with moving parts in front of a solid-state memory that is several hundred percent faster for sequential tasks, and is an order of magnitude faster for random tasks?

Okay.

PhantomGremlin · on Dec 28, 2013

In general, Intel is a "class act".

A few years ago my employer worked with Intel on a joint IC project (not Flash). My overall impression was that Intel engineers were meticulous and smart. This internal culture probably applies to many different Intel divisions. So I'm not surprised that Intel SSDs are reliable.

zerohp · on Dec 28, 2013

I had an old Intel G1 SSD die during a power outage that went a little too long for my cheap UPS. Of course one event in uncontrolled conditions isn't meaningful. I still bought another Intel SSD because every Intel device I've owned was top notch.

kozlovsky · on Dec 28, 2013

Yesterday my Intel SSD 320 failed horrible and kills all data on my RAID system. I still do not understand how it happened. I was sure my storage system is very safe.

My system was configured as a mirror RAID with two 1TB HDDs. The RAID was created with Intel Rapid Storage technology (my motherboard is Asus p8z77v-pro). The failed SSD was used as a RAID cache configured with Intel Smart Response technology.

I was wary to use SSD directly as a main system drive, because I heard of "BAD_CTX 13F" error which happened with Intel 320 drives. My hope was that if SSD is used as just a cache, then in the case of SSD failure the data still be safe. Since this error reportedly occurs during power outage only, I set up an UPS. But all these precautions have not helped.

Yesterday I surfed web with Google Chrome, and my computer suddenly become unresponsive. At first only Chrome was unusually slow, and other open programs work normally, but in a few minutes the computer was totally freeze. I was forced to press reset, and upon restart Windows automatically entered into non-interruptible "recovery mode". After more then 24 hours the OS reported that "further recovery is impossible" and the RAID become unbootable. The SSD serial number was changed to "BAD_CTX 0000013F", the sign of famous "8mb bug". It is interesting that in my case this bug was not caused by any power outage except when I pressed "reset" button, but I don't think this is count as a power loss.

I take an HDD out of the RAID to connect it to other computer and save critical data, but without any success. At first sight all file system looks correct, and I even manage to copy all recent data files, but when I looked into those files it was total mess. Each file consist of some arbitrary chunks of unrelated files, mixed in random order - a bit of some executable file, several lines of my project source code, followed by chunk of some unknown xml configuration file, followed by random bytes, etc. A total mess.

I still don't understand the reason of such spectacular data corruption. I have three hypotheses: 1. SSD cache sent incorrect data to RAID on write (two month ago I switched SSD cache from "enhanced" to "maximized" mode, in which writes initially goes to SSD and only then to the RAID disks). 2. Intel RAID controller goes crazy due to a program error. 3. Windows corrupt data during non-interruptible "recovery" phase.

The moral is, even Intel SSD with UPS is not safe, and mirror RAID cannot not protect data from such errors.

colin_mccabe · on Dec 28, 2013

It is good to see someone actually testing the power-loss protection claims made by manufacturers.

However, uninterruptible power supplies are usually a better investment than power-loss resilient storage media. The problem is, even if your SSD or hard drive behaves perfectly during a power-loss scenario, your server software may not. Almost every database, filesystem, etc. includes some amount of buffering in memory, because sending every write directly to disk is a performance killer.

Also, the best-case scenario with power-loss resilient media is that your system shuts down cleanly. With a UPS, you can keep the system up until diesel generators kick in, a much better endgame for everybody.

I once asked someone who had worked in the hard drive business what a hard drive would do when power was lost. "Try to park the drive head immediately before it crashes on to the platter," was the immediate response. Trying to flush the cache contents wasn't even remotely on his mind. In practice, losing power while writing to a hard disk does often corrupt sectors-- even sectors that weren't being written to during the power loss incident.

It's good to see that (some) SSDs are at least trying to flush the cache, but you really have to ask yourself: can you really trust the manufacturer's claims? And if you can trust them, can you trust your specific software configuration under this unusual scenario? I think it's just too long a frontier to guard with too few sheriffs. Dude, you're getting a UPS.

miahi · on Dec 28, 2013

The problem of SSDs, as others have mentioned here, is that writing during power loss can have more important effects: it can overwrite firmware bits or mapping bits. In the first case the whole SSD is dead, in the second case, much more data than the one currently written is lost.

colin_mccabe · on Dec 28, 2013

My impression was that even cheap SSDs should have ultracapacitors or small batteries that allow them to survive a power loss event without being bricked. Of course, the stuff in the cache is lost at that point, but that's no worse than the situation with a hard drive. Also, as I mentioned, "much more data than the one currently written" can be lost when power fails in a hard drive. So the situation is really no different, unless the manufacturer screwed up.

nexox · on Dec 28, 2013

Spinning disks have enough rotational momentum to keep spinning (which keeps the heads floating) for long enough to park the heads via a weak spring, with zero electricity. A head crash doesn't corrupt a few sectors so much as cause catastrphic damage - that disk would likely never read another sector again. Properly-functionioning spinning disks haven't had issues with random data loss on power failure for at least a decade now.

And your impression of cheap SSDs is dead, flat wrong. They're cheap - every unnecessary part is left off to save money. And we've all (all of us who pay attention) known for years that SSDs (even some with power fail protection) will lose data (even bits which it has reported to have sync'd) on power loss.

A UPS is not enough, if you need to have your data, you need multiple layers of backup, and an SSD must have some method of writing out voltatile data (mostly internal metadata, not cache) before it shuts down.

colin_mccabe · on Dec 31, 2013

Properly-functionioning spinning disks haven't had issues with random data loss on power failure for at least a decade now.

Source?

And your impression of cheap SSDs is dead, flat wrong. They're cheap - every unnecessary part is left off to save money. And we've all (all of us who pay attention) known for years that SSDs (even some with power fail protection) will lose data (even bits which it has reported to have sync'd) on power loss.

I think you misread what I wrote. I wrote that I would expect cheap SSDs to "survive a power loss event without being bricked." I did not write that they would retain all data, which seems to be what you are arguing against.

I have heard rumors that some cheap SSDs do not honor the SATA SYNC command. Unfortunately I do not have a reliable source for this theory, do you?

A UPS is not enough, if you need to have your data, you need multiple layers of backup, and an SSD must have some method of writing out voltatile data (mostly internal metadata, not cache) before it shuts down.

I don't think anyone is arguing that a UPS is a replacement for backups.

CamperBob2 · on Dec 28, 2013

It's long been conventional wisdom that you'd have to be crazy to buy anything but Intel when it comes to SSDs. This study isn't too surprising, in that regard.

kevinchen · on Dec 28, 2013

I think it's more "buy only Intel if you have crazy reliability requirements and no periodic backups." For a typical user who just wants their stuff to load faster, it doesn't really matter: laptops shut down gracefully when they run out of power and power outages don't happen that often.

robocat · on Dec 28, 2013

We only buy Intel for staff because: their time is worth a lot, any interruption due to a flaky drive can be expensive (especially if traveling), and to reduce worry. The time for me to research whether another Brand is reliable costs more than just choosing Intel. A no brainer decision.

I.e. the savings on a cheaper drive are not worth the risks.

Backup, always, of course.

joenathan · on Dec 28, 2013

I've never had any issues with Samsung or Kingston SSDs. The 840 EVO series and Pro series are the best performance for the money you can get.

userbinator · on Dec 28, 2013

Kingston used to rebrand Intel SSDs (same controller, just a smaller amouht of flash) for the "value market", so not surprising they were good.

rythie · on Dec 28, 2013

The Samsung drives are poor when used on a NFS server with sync enabled - they are about 4x slower than the Intel 320 drives doing that. When used without sync, the same if no faster than the Intel drives - with the drawback of course that you've probably got corrupt data if you crash or lose power.

yapcguy · on Dec 28, 2013

Stick to the Pro. The EVO are based on TLC which are cheaper to manufacture and have a reduced life-span.

ADent · on Dec 28, 2013

Their customer service could be better.

I had one that went bad in less than 24 hours (my only SSD that needed a warranty claim). I tried to return it to them - filled out form, still had to call them and talk. In the end the guy suggested it would be better for me (much faster) to return it to Amazon than work the return thru intel.

CamperBob2 · on Dec 28, 2013

If I need customer service for a drive at all, my day is unlikely to get much worse.

herf · on Dec 28, 2013

The Crucial M4 had no power-loss protection. The newer model (M500) does and would be a more interesting test.

maerF0x0 · on Dec 28, 2013

As many have pointed out: There are many applications where power cycling is extremely rare and also not a big deal.

1. Laptops have a built in UPS incase they're unplugged 2. Servers should have UPS incase they're unplugged or small outtages 3. Desktop drives shouldnt be trusted as the only copy. Though I imagine the data corruption would propagate to backups?

JohnTHaller · on Dec 28, 2013

Servers can go down more than you think. Rackspace had an entire wing of their Texas data center go down hard 3 times within a couple months a few years back. I know because PortableApps.com's dedicated box was in that wing. It was an issue with the equipment that switches form external power to battery to generator power. And there was no secondary backup to that, so it would go down and take the whole wing all at once. This would corrupt our Drupal database and it would take about 8 hours to rebuild. So, having a reliable SSD with power loss prevention could be a lot more relevant than you expect.

MichaelGG · on Dec 28, 2013

Unless the server UPS is inside the chassis, disk corruption on power loss seems like a very high risk. Datacenter-wide UPSes can fail (at one of XO's DCs, we had power loss 3 times in 2 months). People can accidentally unplug stuff.

mike_esspe · on Dec 28, 2013

Laptops are vulnerable to this problem, if you don't know about it.

On my notebook I've lost a partition with Crucial M4. OS hanged, I did a hard reset and after reboot discovered data loss.

userbinator · on Dec 28, 2013

I got an Intel SSD (G1 80GB) in my laptop to do OS development (drivers), which incidentally is a use case that requires lots of hard resets. Never had any problems with it, and it's been through dozens if not hundreds of hard reset cycles. In fact I got the SSD specifically because I didn't want to be spinning up/down a disk that much, and laptops tend not to have reset buttons (they really need one, IMHO.)

emn13 · on Dec 28, 2013

Those tend to be very safe hard resets from a drive perspective. First of all, you're not losing power, so even though there's a reset, the drive firmware maintained power. Secondly, I'm guessing you see far more hard locks and manual resets than random, sudden reboots - and if that's the case, then the drive firmware probably didn't even notice. By the time you press reset, an eternity has passed and any ongoing activities have long finished.

I can imagine a software-fault causing drive-level problems if the drive has a large cache and a broken fsync, or if the bios does some kind of unsafe hard drive reset very quickly after starting.

In any case, it's probably more likely to be file-system level reliability you'd need in the face of driver instability.

Rebelgecko · on Dec 28, 2013

For me, the scariest/most interesting part was that some SSDs had problems with the tests in which power WASN'T interrupted.

tanzam75 · on Dec 28, 2013

> For me, the scariest/most interesting part was that some SSDs had problems with the tests in which power WASN'T interrupted.

OCZ drives had that problem. That's not a surprise.

There's a reason why Intel and Samsung drives consistently average 4.5 stars on Newegg, while OCZ drives consistently average 3-stars.

_mikz · on Dec 28, 2013

Would be nice to see this test for Samsung drivers which should be closer to Intel's quality.

arh68 · on Dec 28, 2013

Where are all the dates in this article? When were these drives purchased/manufactured?

Some (all?) of the 320 drives (pre-2012) had a bug that basically bricked the drive after power loss. See more on google at '8mb bug' and this Intel thread [1]. The existence of this bug, the well-known reliability reputation of Intel, and the sheer size of this sampling number (N=500?) make the distinction in time important. Were all these drives more recent, or is the Intel failure rate, even with buggy firmware, still ~.5% ?

>However, given that deployment of over 500 Intel 320 SSDs has been carried out and only 3 failures observed over several years, it would be reasonable to conclude that Intel S3500s could be trusted long-term as well

[1] https://communities.intel.com/message/133499

nexox · on Dec 28, 2013

That bug was not as prominent as you make it sound - I was unable to reproduce the issue (or any other issue) across tens of thousands of reboots on the buggy firmware (running Linux and a quality SAS HBA.) The circumstances to produce a corruption were much more rare than simply "power loss," and many users with "safe" platforms could easily expect 0.5% AFR.

Even with "risky" OS and controller combinations, there was an element of probability involved, so most (probably almost all) power loss events would not hit the bug.

Plus the SSD320 was difficult to obtain back then, and reasonable operators upgraded to the firmware version with this bug fixed, so only a small percentage of the units were ever even vulnerable.

slyall · on Dec 28, 2013

I really hate when people are sloppy with notation.

20GB = Twenty Gigabytes

NOT

20gb = Twenty gram bits

and:

20MB/s = Twenty Megabytes per second

NOT

20mbytes/sec = Twenty milli-bytes per second ( If you got you B's and b's correct about you would need to write out "bytes" )

illicium · on Dec 28, 2013

If you want to be really pedantic, differentiate SI giga/megabytes as GB/MB and "binary" gibi/mebibytes as GiB/MiB

optimiz3 · on Dec 28, 2013

This is actually a major annoyance in some fields - Xilinx for example likes to use GB to represent 1024^3 for storage on FPGAs while HDD manufacturers like to use 1000^3. IMHO [ZYEPGMK]?iB is the way to go to end this non-sense.

6cxs2hd6 · on Dec 28, 2013

I'm old enough to remember what MB and GB meant before sleazy marketers started to redefine them. They should have been sued for deceptive advertising. Instead the computer press of the time was spineless, because guess who paid for the ads.

So I refuse to be a wuss and use MiB and GiB.

(Also, get off my lawn.)

dragonwriter · on Dec 28, 2013

Since mega- and giga- haf well-established meanings as prefixes to measures, and the original common usages of MB and GB were inconsistent with those meanings, I prefer MiB and GiB for those uses, even though it took marketers using the correct versions for devious reasons to get terms popularized that distinguished the base-2 prefixes from the close-but-not-the-same base-10 prefixes.

tanzam75 · on Dec 28, 2013

I agree.

In the kilobyte world, 2.4% may not have been too big of a deal.

In the terabyte world, there's a 10% difference between binary and decimal prefixes. That's way bigger than rounding error. We need to start using the binary prefixes.

stavros · on Dec 28, 2013

So 1024 bytes is 1 kiB?

jpk · on Dec 28, 2013

Yep, see: http://en.wikipedia.org/wiki/Binary_prefix

robocat · on Dec 28, 2013

From your link it is KiB with a capital K.

barrkel · on Dec 28, 2013

I just use K, M, G, T to evade the whole ISO thing altogether.

Kibbles, mibbles, gibbles and tibbles just make me shake my head.

jpk · on Dec 28, 2013

My favorite is "5KB". The prefix for kilo- is a lowercase 'k'. Uppercase 'K' is Kelvin. I don't even know how to conceptualize a Kelvin-Byte.

wyager · on Dec 28, 2013

>I don't even know how to conceptualize a Kelvin-Byte.

I can actually see where this unit would be useful; in determining the probability of bit rot. The more kelvins you have, and the more bytes you have, the more likely you are to flip a bit due to random fluctuations. temperature*storage capacity = Kelvin-Bytes.

zAy0LfpBZLC8mAC · on Dec 28, 2013

Once you have that figured out, try to conceptualize Kelvinbyteohms aka Kelvinbytes per Siemens!

ghshephard · on Dec 28, 2013

I read through the report a couple times, but couldn't figure out how many drives he had in each test batch. I'm a little concerned that he just took a single drive, and hoped it would effectively represent the entire model.

Clearly there will be some variability - and he may have gotten a good, or bad drive - and a larger population of SSDs might behave quite differently in terms of reliability.

mandeepj · on Dec 28, 2013

My brand new crucial M4 ssd stopped working after few days. To be precise, the system would show the drive for about 2 minutes after each reboot then after that the drive would not show up anywhere at all. Whether you try in windows explorer or device manager or any where else. I contacted their support team but they never replied.

sitkack · on Dec 28, 2013

Crucial reportedly has better support than that. My M4 would crash after 21 minutes (exactly-ish) over a year continuous service (100+GB/day), the latest firmware fixed the problem and increased speeds. The USB firmware updater was a god send compared to the vile vile hoops I had to jump through to upgrade my OCZ drive.

I currently have an Samsung 840 1TB and has been rock solid, it replaced a 250GB intel 320. Afaik Samsung is the only manufacturer that owns the whole supply chain, flash + controller.

kalleboo · on Dec 28, 2013

Toshiba also do both NAND and controllers.

codecop · on Dec 28, 2013

I got my Crucial M4 4 month ago and it stopped working too. BIOS dont show it. Got replaced with new one. After 3-4weeks failed again. Bios shows SSD but testing shows many errors. I will send it again or will try update firware.

MatthewElvey · on Dec 28, 2013

I have a Viking VRFS21100GBCNS which supposedly features Super Capacitor power failure protection. Manufacturer PDF - https://docs.google.com/file/d/0B9JXvW974L4JUkxJMGJIUjNtRDg/...

rythie · on Dec 28, 2013

I've found that the Intel 320 drives are much than the Samsung 830/840, Crucial M500 when setup for sync writes on Linux (as a NFS server) - with the Intel drives being about 4x faster than the others in my testing.

ricardobeat · on Dec 28, 2013

Would be interesting to know how Samsung SSDs, very popular now and used in Apple's line, fare on that test.