It sounds easy but it's not: imagine you're the software controling the SSD. Before you do write you don't know if the write will succeed. Once you've written, exactly the information that gives the pointers to the valid data could be the one that is destroyed. Then the "raw" data you can access can be "out of order" but that would still be better than nothing. I can imagine that there would have to be some special "recovery mode" which would allow ythe user to rescue the data blocks even with the uncertain order, in case you are willing to piece some of them together. But almost certainly there aren't much people willing to pay for that.
"Read only" after some fixed number of writes would be safer. But then the complaints would be "why can't they just allow me to write as much as I can, I have the backup somewhere anyway." Which is also a valid wish. So it would be the best to be able if user could select the mode.
And what about the drive that went into read-only mode, and then on reboot bricked itself? By design?
I can easily see someone having their computer lock up / etc, restarting, and losing their data. As restarting is often one of the first debugging steps for so many things.
You know, the drives have their own CPU and RAM, and execute their own software. The software keeps the copy of some table needed to do the proper reads in RAM and tries to update it to the flash as the response of the normal data write. The flash fails and the software gets the notification about that. The software can provide the reads from the info in RAM as long as the info remains in RAM so it enters the read-only mode. After the reboot there are only the bad bits on the flash, the RAM content is fully wiped away by the reboot.
So have a small area of flash that's only used to dump the RAM to at EOL. I, for one, would much rather have a little bit less space available, with a better chance of not bricking at EOL, than the current situation.
Looks like a good idea to me, the question is how small the area would be. I think I've read somewhere that writing that data takes some 40 seconds on some Samsung given the write speeds around 400 MB/s which is approx 16 GB reserved for the feature of user being able to read the failed drive which notified the user long before that the declared number of counts was already spent. Somebody who actually has better (industry insider) info is welcome to correct me.
As restarting is often one of the first debugging steps for so many things.
If you're the kind of person who can troubleshoot a faulty drive, then you're not the kind of person who should reboot as a first step towards debugging. Especially if you suspect a faulty drive.
I suspect that most people, when a computer freezes, don't instantaneously suspect a faulty drive. At least not before they try other things, a large chunk of which require reboots.
>It sounds easy but it's not: before you write, you don't know if the write will succeed. Once you've written, exactly the information that gives the pointers to the valid data could be the one that is destroyed.
How is that different from any other write the drive had experienced though?
Every single other previous write had a way of retrying by way of spare flash cells. That last write is fundamentally different, because there are none left.
Intel died after power cycle because of buggy Sandforce chip/firmware. Intel bullshit fairytale was post facto rationalization. There is NO UNIVERSE where bricking user product full of data is a good thing, especially if you promised read only failure mode in the documentation.
Everything else should still be fine though. And the flash that holds the pointers (redirection table for wear leveling) can be a slightly higher quality flash that lasts just a little bit longer.
When the SSD is bricked to read it's the redirection table that is corrupted. The corruption means that the random bits are wrong. Once the random bits are wrong, the redirection is wrong. If the redirection table is in the separate flash, you have to spend more flash for redirection table than for data, as that table will be heavily stressed (the redirections have to be often updated with the writes if you want good wear levelling). If you offload the part of the redirection table to the "normal flash" you're still at the beginning. Some random bits will be wrong after some write and you depend exactly on them to be able to read properly.
And as for power outages and hard resets, there's these nifty things known as capacitors. It only needs to last as long as is necessary to write the data to flash.
A better option would be to spend lots of time repetitively writing down the pointers in newer and newer iterations every N blocks, so that of fallback available if one block holding pointers fails.
Considering spinning rust used to fail catastrophically, sometimes very quickly, I'm really not bothered.
Plan for the worst, hope for the best.
In fact my personal advice is to plan to lose everything since your last backup. If you can't afford that risk, back up more often.
No disk technology is perfect nor instantly comparable. I consider this to be purely a 'value for money before it blows up' assessment. The results make me happy as I've been recommending and use 840 pros for a while now.
...and which magical hard drive do you buy that guarantees it will fail gracefully? Customer hard drive reviews are littered with sudden drive deaths and DOAs. If the electronics or motor go on a hard drive, the data is practically toast unless you're ready to pay out the nose for exotic data recovery.
This is why i really wish that some kind of "floppy" had kept up with HDD capacities. There the drive electronics etc was by design separated from the storage media. As such, having the control board or similar keel over meant getting a new drive.
Isn't MicroSD what you want? Even when hard discs were tiny they were bigger than any floppies. Zip drives were possibly the closest, but they were only used because they were portable, not because they were any good.
Zip drives could also go bad and screw up your data if the mechanism failed. Stupid click of death... a completely separate storage layer can't always save you.
SDs still have a level of electronics between the storage IC and the rest of the system. There may have been a solid state format like what i'm thinking about, but i think it is long defunct.
There was/is a packet writing format that can be used, as far back as CD-R i think, but early Windows needed a special driver just to read them. Supposedly Universal Disk Format (UDF, DVD and later) has it baked in. But i can't say i have ever seen it used.
It is kind of like when you have a power loss while writing to a file system (like say FAT32 or EXT2) and when it comes back up you have to do a fsck or equivalent. If something gets messed up it may not come back in an consistent manner. Something similar happens at and SSD level. One of the most complex aspects of an SSD to implement and test.
Not sure, but I'm imagining it's to avoid someone throwing out a read-only drive filled with sensitive data (and unable to erase it through conventional means - that is, writing to the drive)
Of course, there are other solutions to this problem: sledgehammer, etc
A bricked device still has the data on it. It's fake security.
And you could allow secure erase while still prohibiting normal writes. The cells aren't actually gone or unusable after all, just unreliable. A secure erase would work fine on them.
Exactly, SSD's sudden death is the major reason that I dare not use it for any serious purposes at the moment, good backup scheme does help, but a sudden death is still too bitter to taste in reality.
This is why, for anything I care about, I have them in RAID1 or similar. I have a home server with a pair of SSDs for the system volumes (and hosting a few VMs: mail server, web/app servers, play things, ...) and a pile of spinning metal in RAID5 (I'm considering moving to 6 because of the "new error during rebuild as rebuild takes ages on large drives" issue) for media storage.
One of the SSDs died a couple of weeks ago (unexpectedly Linux didn't fail safe properly, it fell over trying to read from the broken one and didn't seem to try read from the other instead, upon coming back up it ran from the good drive OK) and RAID saved my much hassle rebuilding from the previous night's backups.
The SSDs in my desktop box aren't RAIDed, nor in my laptop (two wouldn't fit), but I can easily survive either of those being out of action for a couple of days and may backup regime involving copies on the local file server and on remote hosts (fingers crossed) should mean nothing gets lost if they do die.
I've not found the failer rates of SSDs to be any different to spinning media - I don't trust important data to a single drive of either type.
That title seriously misrepresents the findings. These drives lasted orders of magnitude longer than they were rated for. For 99% of people on the internet, you could write all your daily traffic to any one of these drives for the rest of your life (assuming it stayed constant).
Headline is accurate. A cursory glance at the article gives you a clear picture of what the drives were put through. I'm happy to know that SSDs can handle as much r/w as they can.
really impressive showing from all drives involved. I would have liked to see at least one of the drives fail to read only, but its nice to know that at the rate I use SSD sectors I should be ok for a pretty good while. While there is not great way to test this other than wait and see, I wonder if age AND total writes has an appreciable difference vs. marathon testing like this.
Disappointed that the Crucial MX100 / MX200 wasn't included on this. We've been using Crucial exclusively for just over 5 years with customer laptops. We swap the hard drives out for Crucial SSDs when setting them up for the first time. In 5 years, with average use, we've only ever had a single failure... out of literally hundreds of laptops we setup per year.
The 840 Pro that lasted more than 2 PBW was rated for 73TBW.
>With twice the endurance of the previous model, the 850 PRO will keep working as long as you do. Samsung's V-NAND technology is built to handle 300 Terabytes Written (TBW)* which equates to a 40 GB daily read/write workload over a 10-year period. Plus, it comes with the industry's top-level ten-year limited warranty.
* 840 PRO: 73 TBW < 850 PRO: 150 TBW
850 PRO 120/250 GB : 150 TBW, 500/1TB(1,024 GB) : 300 TBW
The drive has 400 MB/s on the interface. That means that it's possible to deliver 34 TB per day to the disk and the rated 300 TB are achieved in less than 9 days. But the scariest effect is the "write multiplication:" the possibility to deliver the small chunks of the data through the 400 MB/s interface which can result in much more data actually overwritten on the flash (as the atomic size of the writes on the flash is quite big).
Of course an average consumer can't produce that. A lot of the data I write to any medium at home is never overwritten. But not every use is the use of an average consumer, knowing the actual limits is important.
That's why enterprise SSDs are rated at number of full writes per day rather than GB per day. If you are going to write at 400 MB/s for 24 hours a day then you need an enterprise SSD and not a consumer one.
they ALL are, every single SSD has this magical pulled out of the ass ~70TB number. And dont expect it to go up any time soon, every new generation of flash has fewer write cycles, this cancels out any gains you might have from greater capacity.
Very impressive - I've been following the story since the start with quite a vested interest - I'm currently building mixed tier SSD only SANs, one tier consisting of high end PCIe NVMe SSDs, the other consisting of Sandisk Extreme Pro III 'consumer - available' drives.
Inspired by some of their earlier articles, we did a similar test in-house on an ssd model we were planning on deploying into production. The goal was to make sure we knew exactly what to look for when our ssds were on their last legs. We actually bought a smaller capacity version of the same model so the test wouldn't take as long.
I feel a lot better about that production hardware now that I have seen first hand what SMART reports as the drive is running out of reallocation sectors.
>I would love to see how this stacks up to enterprise SSDs.
the difference is number of writes an individual flash cell can take. Consumer - multi-level cell (MLC) - 1000-3000 writes (thus 240GB disks start to fail around 600-700TB written) where is enterprise SSD flash - single-level cell (SLC) - up to 100000 writes.
Disappointing he didn't test the new Samsung 850 SSD. The old 840 is known to not have such a high endurance so I'm automatically excluding it from my list of choices.
They read and wrote data almost constantly at full speed for 18 months, which is very likely not what normal people using these drives would do with them. Please read the article.
The SSD Endurance Experiment represents the longest test TR has ever conducted. It's been a lot of work, but the results have also been gratifying. Over the past 18 months, we've watched modern SSDs easily write far more data than most consumers will ever need.
Not sure what you're going on about here. The cheap EVO wrote at least 100TB, the PRO wrote 2.4PB. After working on home machines for years you rarely see drives that have anywhere near 100TB written in SMART. If you are writing that much data get the PRO edition and the warranty that comes with it.
Maybe they got lucky. They only tested one of each of these drives, and there was already a case (with the Kingston drives) where nearly identical models got wildly different lifetimes.
The low number of reallocated sectors suggests that the NAND deserves much of the credit. Like all semiconductors, flash memory chips produced by the same process—and even cut from the same wafer—can have slightly different characteristics. Just like some CPUs are particularly comfortable at higher clock speeds and voltages, some NAND is especially resistant to write-induced wear.
The second HyperX got lucky, in other words.
So it's definitely possible that the 840 Pro has reliability problems even though one of them won this test.
Thank you for this! Yours is the most astute comment I've seen here. As someone who's worked in flash memory manufacturing, I know how much variation is inherent in the process. It gets evened out somewhat by binning, but there's inevitable variation in dielectric thickness, aspect ratios, corner profiles, dopant diffusion, grain structure of polysilicon, and on and on, even discounting the density of particulate defects and watermarks.
Even if you don't know that, basic stats should tell you that testing a single unit from each manufacturer is going to be meaningless at best, misleading at worst.
Is less expensive than testing a dozen drives of each model, and better than testing no drives at all. It's not meaningless at all; it just has to be read with the right caveats.
I am NOT impressed with the behavior at end of life. In fact it makes it wary of using any SSD.
A bad SSD should go read only, and stay that way. Not self-brick, and make the data unrecoverable.