Don't SSDs have a finite TBW? 50GB of writes everyday (possible on large project...

nickjj · on Feb 4, 2021

I've had a Crucial 256GB SSD (MX100) since early 2015 and I use it with Windows 10. WSL 2's file system is on there along with Docker, which I've been using full time since then. That means all of my source code, installing dependencies, building Docker images, etc. is done on the SSD.

The SMART stats of the drive says it's at 88% health out of 100%, AKA it'll be dead when it reaches 0%. This is the wear and tear on the drive after ~6 years of full time usage on my primary all around dev / video creating / gaming workstation. It's been powered on 112 times for a grand total of 53,152 running hours and I've written 31TB total to it. 53,152 hours is 2,214 days or a little over 6 years. I keep my workstation on all the time short of power outages that drain my UPS or if I leave my place for days.

Here's a screenshot of all of the SMART stats: https://twitter.com/nickjanetakis/status/1357127351772012544

I go out of my way to save large files (videos) and other media (games, etc.) on a HDD but generally most apps are installed on the SSD and I don't really think about the writes.

tomc1985 · on Feb 4, 2021

As a counterpoint, I burn-tested several random M.2 NVME drives over a period of a month of 24/7 writes and reads and all but one model failed before the month was up

rbanffy · on Feb 4, 2021

Heat dissipation can be an issue. Writing continuously generates a lot of heat.

tomc1985 · on Feb 4, 2021

Part of the purpose of a burn test is to see how it handles temperature under load. We didn't have the option of adding cooling, many of the product installations took place in a hot climate, and nobody wanted to pay for a hardened part...

Anyway, my point is that SSD drive reliability varies wildly

nwmcsween · on Feb 5, 2021

Nvme does, SSDs as in sata doing this burn test will probably work fine

bluedino · on Feb 3, 2021

Something like the Samsung EVO 960 (typical mid-range SSD) will take 400TB of writes in it's lifetime. So that's 8,000 days of 50GB writes.

spullara · on Feb 3, 2021

Hmmm. Looks like I need to move my temp dir for GeForce instant replay off of my SSD. It records about 1.6GB/5min which is 460GB per day. RAM disk would probably be the best option.

reitzensteinm · on Feb 3, 2021

I'm pretty sure it doesn't record the desktop with instant unless you have specifically set that up, so you'd only be writing to the drive while you're in game.

spullara · on Feb 4, 2021

True, true.

daemin · on Feb 4, 2021

At my current job I do at least 150-200GB of writes per day. 50GB for code temporary files, 70+ GB for data files, x2 that for packing them, and then also deleting some of those to make some room to do it again.

Also the disk that it's on is over 50% full so that also degrades it faster as there's fewer blocks to wear level with.

magicalhippo · on Feb 4, 2021

The larger your SSD the more flash cells you have, so the more data you can write to it before it fails.

You can see this from the warranty for example, which for the Samsung 970 EVO[1] goes linearly from 150TBW for the 250GB model up to 1200TBW for the 2000GB model.

So if you take the 1000GB model with its 600TBW warranty, you can write 50BG of data per day for over 32 years before you're exhausted the drive write warranty.

[1]: https://www.samsung.com/semiconductor/minisite/ssd/product/c... (under "MORE SPECS")

frizkie · on Feb 3, 2021

They do but it's really large. The Tech Report did an endurance test on SSDs 5-6 years ago [0]. The tests took 18 months before all 6 SSDs were dead.

Generally you're looking at hundreds of terabytes, if not more than a petabyte in total write capacity before the drive is unusable.

This is for older drives (~6 years old as I said), and I don't know enough about storage technology and where it's come since then to say, but I imagine things probably have not gotten worse.

[0]: https://techreport.com/review/27909/the-ssd-endurance-experi...

yread · on Feb 3, 2021

> things probably have not gotten worse.

I am afraid they did, consumer SSDs moved from MLC (2 bits per cell) to TLC (3) or QLC (4). Durability changed from multiple petabytes to low hundreds of terabytes. Still a lot, but I suspect the test would be a lot shorter now.

Dylan16807 · on Feb 3, 2021

The TLC drive in that test did fine. 3D flash cells are generally big enough for TLC to not be a problem. I would only worry about write limits on QLC right now, and even then .05 drive writes per day is well under warranty limits so I'd expect it to be fine. Short-lived files are all going to the SLC cache anyway.

dekhn · on Feb 3, 2021

I've swapped hundreds of terabytes to a terabyte SSD (off the shelf cheapie) with no recognizable problems (the gigapixel panoramas look fine).

nine_k · on Feb 3, 2021

SSDs avoid catastrophic write failures by retiring damaged blocks. Check the reported capacity; it may have shrunk :) Before you ever see bad blocks, the drive will expend spare blocks; this means that a new drive you buy has 10% more capacity than advertised, and this capacity will be spent to replace blocks worn by writes.

wtallis · on Feb 3, 2021

SSDs never shrink the reported capacity in response to defective/worn-out blocks. Instead, they have spare area that is not accessible to the user. The SMART data reported by the drive includes an indicator of how much spare area has been used and how much remains. When the available spare area starts dropping rapidly, you're approaching the drive's end of life.

Wowfunhappy · on Feb 4, 2021

I so happened to know this already, but I must say I've always found the approach somewhat weird. Wouldn't it make more sense to give the user all the available space, and then remove capacity slowly as blocks go bad? I guess they think people would be annoyed?

Imagine if we treated batteries like SSDs, not allowing the use of a set amount of capacity so that it can be added back later, when the battery's "real" capacity begins to fall. And then making the battery fail catastrophically when it ran out of "reserve" capacity, instead of letting the customer use what diminished capacity was still available.

wtallis · on Feb 4, 2021

Shrinking the usable space on a block device is wildly impractical. The SSD has no awareness of how any specific LBA is being used, no way to communicate with the host system to find out what LBAs are safe to permanently remove. You can't just incrementally delete LBAs from the end of the drive, because important data gets stored there, like the backup GPT and OS recovery partitions. Filesystems also don't really like when they get truncated. Deleting LBAs from the middle of a filesystem would be even more catastrophic. The vast majority of software and operating systems are simply not equipped to treat all block devices as thinly-provisioned. [1]

And SSDs already have all the infrastructure for fully virtualizing the mapping between LBAs and physical addresses, because that's fundamental to their ordinary operation. They also don't all start out with the same capacity; a brand-new SSD already starts out with a non-empty list of bad blocks, usually a few per die.

Even if it were practical to dynamically shrink block devices, it wouldn't be worth the trouble. SSD wear leveling is generally effective. When the drive starts retiring worn out blocks en masse, you can expect most of the "good" blocks to end up in the "bad" column pretty soon. So trying to continue using the drive would mean you'd see the usable capacity rapidly diminish until it reached the inevitable catastrophe of deleting critical data. It makes a lot more sense to stop before that point and make the drive read-only while all the data is still intact and recoverable.

[1] Technically, ATA TRIM/NVMe Deallocate commands mean the host can inform the drive about what LBAs are not currently in use, but that always comes with the expectation that they are still available to be used in the future. NVMe 1.4 added commands like Verify and Get LBA Status that allow the host to query about damaged LBAs, but when the drive indicates data has been unrecoverably corrupted, the host expects to be able to write fresh data to those LBAs and have it stored on media that's still usable. The closest we can get to the kind of mechanism you want is with NVMe Zoned Namespaces, where the drive can mark individual zones as permanently read-only or offline. But that's pretty coarse-grained, and handling it gracefully on the software side is still a challenge.

Wowfunhappy · on Feb 4, 2021

That makes a lot of sense, thank you for the detailed answer!

okr · on Feb 4, 2021

I can imagine, OSes in general are not prepared to such reported numbers shrinking. There would be a hole.

What if a moment ago my OS has still 64G and then all of the sudden it only has 63G. Where would the data go? I think something has to make up for the loss.

For me it makes sense to report logically 64G and internally you do the remapping magic.

I wonder, how some OSes deal with a hot-swap of RAM. You have a big virtual address space and all of the sudden there is no physical memory behind it.

Hm.

dekhn · on Feb 3, 2021

Now I'm curious how the kernel responds to having swap partitions resized under it.

zlynx · on Feb 3, 2021

Not quite how it works. The SSD will never have a capacity lower than its rated capacity.

For the failures I've seen, once the SSD goes to do a write operation and there's no free blocks left, it will lock into read-only mode. And at that point it is dead. Time to get a new one.

dekhn · on Feb 4, 2021

Now I'm really curious. I took the drive I swapped hundreds of terabytes to, and put it in a server (unfortunately not running a checksumming filesystem) and it ran happily for a year.