Western Digital HDD boss mentions archive disk drive idea

h2odragon · on Dec 14, 2021

Once the latency requirement goes past ... i dunno, a few seconds? "Insh'allah" time frames; tape comes in and stomps all over the competition with its space / cost tradeoff. Dismountable media and a mechanism to swap tapes (even a meatsicle if need be) just has too many advantages to compete with.

Gets me thinking tho: Why stop at 5.25in drives? Let's go back to washing machine sized units or even bigger. Let's stack platters so dense and heavy that we can use the spindle as a flywheel UPS system too.

Or how about making a truly gigantic tape spool? Perhaps kevlar threads impregnated with iron could serve as "tape" and we could spin kilometers of that onto a spool.

notacoward · on Dec 14, 2021

Tape also has some inherent disadvantages. Perhaps surprisingly, the lack of near-instantaneous seek to the data you want isn't one of them. The larger issue at scale is that it's not disk. It's a separate kind of device, with its own power and (often very stringent) environmental requirements, and its own software. Also big silos aren't cheap, both because of inherent cost and for lack of significant competition. Hyperscalers hate that type of heterogeneity, so they'll only go down that road if they feel they have no other choice. Yes, media are cheap, but the extra operational complexity can obliterate that advantage.

A lot of this doesn't apply as you go smaller, but also the drive cost starts to dominate (can't be amortized over many cheap tapes). There's a pretty narrow band where tape is likely to have a real advantage over super-high-capacity disks.

m0lecules · on Dec 14, 2021

Why can't tape be self-contained like HDDs?

Aside from the fact that most people using tape for archival storage don't want to pay extra for the read/write heads, SATA interface, etc., there is no reason why you couldn't package all these things into a self-contained tape unit with a small flash disk acting as a small cache and directory listing.

You could definitely package such a thing for consumers, for example, but most workloads there aren't a great fit for the medium. Basically the only thing that makes sense is using it for archival and backups.

dragontamer · on Dec 14, 2021

> Why can't tape be self-contained like HDDs?

Because that gets rid of the main advantage of tape. Which is that the tape-media has no read/write head and is therefore much much much cheaper to mass produce.

---------

In practice, people buy tape-libraries entirely. Like a 3U unit with 50-tape slots + a few drives to read/write to those tapes, and then hook them up to the network.

https://www.quantum.com/en/products/tape-storage/

From this perspective, you buy as many tapes as you want storage (aiming for 500TB? Buy like 40 LTO7 tapes for your library. Aiming for 1000TB? Buy 80 LTO7 tapes for your library, assuming compression of course).

From there, you just read/write to the library, and have the underlying software handle the details, like any other NAS.

riobard · on Dec 14, 2021

> assuming compression of course

I don't know why this seems the standard practice in the industry, but it really annoyed me when I realized a “15TB” LTO-7 tape has actually only 6TB real, “native” storage coz it assumes some average compression ratio.

Why is this acceptable? What if I use the tape to store incompressible data like video and images? Feels like intentional cheating.

jlarocco · on Dec 14, 2021

Meh. It's only cheating if the manufacturer keeps it a secret, and they don't.

When a company is spending >$20k on a tape system, the people in charge of buying it will talk to the sales people, tell them the use case, and get a more accurate estimate.

loves_mangoes · on Dec 14, 2021

If the buyers are engineers that understand the concept of storage space measured in bytes, this 'estimate' does not help them. If they don't, it only serves to mislead.

The fact that buyers may talk to salespeople is really not an excuse for the deceptive behavior.

"well, in our restaurant medium rare means well done and rare means medium rare, but that's okay our customers are well-paid professionals, they'll talk to waiters to get a more accurate picture"

gowld · on Dec 14, 2021

Tangent: That's how most restaurants operate, because most people don't know what the words mean and get upset when they ask for "rare" and get what they asked for.

Clothing manufactures also lie about the waist measurements of pants. (Go measure yours and see.)

_abox · on Dec 15, 2021

Funnily enough the first hard drives were like this too. They were a round pizza-sized (but a few inches thick) replaceable cartridge and could be inserted into a 'reader' which contained the heads.

But I guess work the data density required these days and the extreme closeness of the head to the platters this won't work anymore as dust would get in.

But this is how the first hard drives worked, we had some with our pdp-11 at the computer museum.

pradn · on Dec 14, 2021

If you're actually hyperscale, say 100 EB of tape, then you'll have dedicated teams just for tape archives. The heterogeneity doesn't seem like much of a problem when you have 200 engineers x 300k USD per engineer to spend on managing the system.

If anything, we're seeing hyperscalers becoming more heterogeneous with CPUs. Google has training and inference TPUs, custom silicon for encoding media, and many more custom CPUs. It makes sense for storage if the benefit is there.

notacoward · on Dec 14, 2021

Those are completely different levels of heterogeneity. A GPU/TPU might draw a lot of power, but does it have to be separate power? Do you need to put that stuff on a separate slab to isolate it from vibration? Does it need its own air filtration system? Do you need a new runbook, because your usual redundancy approaches can't be used for million-dollar systems? Do your DC techs need new training, because the usual "yank and replace" isn't applicable either?

I've watched colleagues in a team adjacent to my own work through these issues. Have you? Slapping a commodity card into a box and loading some commodity software on it seems like a cakewalk by comparison. Obviously people thought it was worth it anyway, but I think you're seriously misunderstanding where the difficulties lie and what kind of resources are needed to overcome them. 200+ engineers would be overkill (unless FB engineers are ~4x more productive than Google engineers) but you'll need other kinds of specialists as well and the installation/operational costs will still be high. It's unlikely to be worth it for an organization much smaller or different.

pradn · on Dec 14, 2021

Yes, these factors would probably limit massive tape archives to only the biggest companies.

formerly_proven · on Dec 14, 2021

Tape has an inherent density advantage over hard drives because tape stores data in the volume - tape is a 3D storage technology. Hard drives are limited to recording on a handful of slices through a cylinder inside them; tapes record on the peeled surface of a cylinder roughly the same size. One tape cartridge contains more than 30 times as much surface area as the highest-density hard drives for recording data.

colejohnson66 · on Dec 14, 2021

The trade off to all that capacity is latency. But for archival purposes, it doesn’t matter if it takes 2 seconds or 2 minutes. As long as the writes are quick enough to keep up with the data.

monocasa · on Dec 14, 2021

And tape has (historically at least; I haven't check in for a few years) been faster at purely streaming loads than spinning rust. Pre SSDs, there'd be systems that streamed to tape to keep up with input data, then would offline load that into spinning rust databases to crank numbers where seeking dominated.

ComputerGuru · on Dec 14, 2021

Yup. A well-known issue with tape is that machines have a hard time supplying it with data fast enough to keep up with its linear travel speed when creating the backups.

selfhoster11 · on Dec 14, 2021

I believe that LTO can actually regulate the speed of the tape

ComputerGuru · on Dec 14, 2021

It can, within reason. Also source machines (and more importantly, source media) are much faster in the first place this side of 2010.

selfhoster11 · on Dec 14, 2021

Yup. I did some tests and my SSD (and even HDD) can handily keep up with the data rate. If not, I can always add a small staging drive the size of a tape to pre-prepare the data for writing.

dragontamer · on Dec 14, 2021

> Gets me thinking tho: Why stop at 5.25in drives? Let's go back to washing machine sized units or even bigger. Let's stack platters so dense and heavy that we can use the spindle as a flywheel UPS system too.

Because modern 20TB hard drives already take like a full week to read from beginning to end (aka: time for a RAID6 rebuild), which is too long as it is.

The problem with hard drives is that they need "more read heads per byte" than the current technology. You can solve this by making more hard drives (ex: RAIDing together smaller drives, say 4x5TBs), so you have more read/write heads going over your data faster.

There's multi-actuator hard drives coming up (2 independent heads per drive), which will report themselves to the OS as basically 2-hard drives in one case. I think that's where the storage market needs to go.

------------

Tape is the king of density: having the most bytes of storage on the fewest number of read/write heads possible. But Hard Drives encroach upon that space, especially with like 20TBs on one read/write head.

> Or how about making a truly gigantic tape spool? Perhaps kevlar threads impregnated with iron could serve as "tape" and we could spin kilometers of that onto a spool.

There's no advantage to that. A stack of 300 LTO-tapes will practically use up the same amount of space as one-tape that's 300x longer. (Besides, LTO-8 tapes are 960 meters long, roughly 1km. The idea of pulling on a thing that's 300km long and hoping for it to not rip itself apart is... pretty crazy. There's definitely some physical constraints / practicality with regards to just material science: shear/stress kind of calculations)

The "jukebox" concept... really a tape-libraries (robot that picks tapes out of storage compartments and shoves them into the drive) is the ultimate solution to density.

hakfoo · on Dec 15, 2021

I never understood why they didn't make a hard disc that operated its heads in parallel and distributed data accordingly. Still one assembly, but using all the resources it has.

You'd have (for example) a four-platter, eight head drive, and instead of storing a given byte serially at cylinder 123, head 0, sector 5, bytes 0-7, you'd store it at cylinder 123, sector 5, byte 0, and light up heads 0-7 all at once.

Now, this might have been hard with early logical drive designs that tracked the physical geometry and expected a more serial format, but that was long ago hidden beneath LBA translation.

Maybe there's a power or crosstalk issue with activating 8 heads at once.

rasz · on Dec 15, 2021

because at those scales heads on the same arm move in relation to one another due to vibration, thermal expansion etc

323 · on Dec 14, 2021

I have a need to store a few hundred TB of data, and tape is extremely expensive for this, since it has a $5k+ upfront cost.

So I just buy external $200 USB hard drives.

There is a market for something lower latency, higher capacity, but pay as you go (no upfront cost).

Frankly I wouldn't mind some sort of giant disk drive, vinyl record size, if that makes sense and is lower cost/TB.

jmwilson · on Dec 14, 2021

It didn't used to be this way. Tape was once affordable for individuals; ca. 2000 Travan was marketed to consumers and DDS wasn't too much of a stretch for prosumers. But they're all gone now and only LTO is still standing. Given the small, cornered market segment that tape now serves, there is no incentive to bring the price down. I'd like the option of being able to manage my own backups locally instead of using cloud storage, but the economics of Glacier remain too good until the storage size is several times what I could foresee needing.

selfhoster11 · on Dec 14, 2021

My tape setup cost less than a single external hard drive. Older tape techs come down in price in time.

wazoox · on Dec 14, 2021

My company has built something for that.

Basically you have a nextcloud instance that you fill up with your data, then you ask for an LTO tape to be written with it. Repeat ad infinitum.

The tapes are yours; when you buy a tape it comes with 3 free operations (read, write, checksum control). If you want your tapes back, we'll mail them to you, else we keep them in storage for you.

As the tapes are written using LTFS, you can easily read them back anywhere with the proper drive.

You only pay a fee for the cloud storage; it also gives you access to your archive's database (what file is on which tape, etc).

xmodem · on Dec 14, 2021

That sounds fascinating. Where can I learn more? Other than writing the same data out to multiple tapes, do you have any capacity for redundancy, should a tape go bad?

wazoox · on Dec 15, 2021

Yes you can ask to have several copies, as you simply pay for the tapes. The archival solution can store data also on disks or any S3-compatible storage.

We also implemented the ability to turn off entirely (cut the power) disk chassis when unused, to save power (using 60 or 102 drives chassis). So if you have enough data (a few hundred terabytes to petabytes) it can make sense too.

ksec · on Dec 14, 2021

Is it https://www.intellique.com?

wazoox · on Dec 15, 2021

You can find the archival solution at intellique.org and open a free account on demo.intellique.org . The whole solution is OSS, for obvious durability reasons.

h2odragon · on Dec 14, 2021

Yes. There was a time when the tape drive wasn't so expensive; the densities weren't there either but that market climbed the wrong side of the curve in a couple of ways.

The modern tape drives are miracles of precision and the tapes are these cool little cartridges but why? The meat and machines handling tape carts liked the of 3/4in format just fine. With modern electronics for the write heads and slightly less ruinously expensive loading and running hardware we could still get decent data density, even if we have to hunt for the tracks each tape load.

memco · on Dec 14, 2021

It’s true tapes are okay when time is not of the essence, but there’s currently a lot of pain dealing with tapes when it comes to loading reading writing and verifying everything. You need some way to make sure the data on the tape matches the original after writing. Then you need some way to verify the copy you get back from the tape matches. You need to have multiple copies because sometimes whole tapes go bad. You also have a limited number of times you can try to read or write the tape without too much wear so you have to have a plan for how to read or write those things efficiently. This is stuff that can be mostly automated but it’s surprising to me that mostly this is done manually and the tools that do exist are often bespoke in one way or another. A nice thing about a hard drive is that most of that logic is baked into the firmware of the drive so you are less dependent on having a good system for verification and record keeping (though you still need it). Tapes can currently take minutes to load and hours to read or write if you need to copy the whole thing. I’m sure if someone was going to make those dice of drives they’d figure out the economics and logistics but it would be a monumental effort both technically and physically to make it work well.

dragontamer · on Dec 14, 2021

> You need some way to make sure the data on the tape matches the original after writing.

Do you? Virtually every filesystem has some kind of error detection, maybe even error-correction built in.

That seems like a solved problem to me. CD-R solved this by writing Reed-Solomon codes every couple of bytes so that if any error occurred, you could just fix them on the fly. (As such: you could have scratches erase all sorts of data, but still read the data back just fine)

I have to imagine that tapes have a similar kind of error-correction going on, using whatever is popular these days (LDPC?). Once you have error correction and error detection, you just read/write as usual.

-------

If Tapes don't have that sort of correction/detection built in, you can build it out at the software level (like Backblaze used to do)... or maybe like Parchive (https://en.wikipedia.org/wiki/Parchive).

ComputerGuru · on Dec 14, 2021

> Virtually every filesystem has some kind of error detection, maybe even error-correction built in.

Nitpick: you mean physical encoding, not filesystem.

Not nitpick: it’s nowhere near enough. Blu-Ray bitrot is a huge issue and if you don’t either write out your data twice (to the same disc or to distinct discs) or use PAR2 or similar, your backup isn’t worth the money you paid for those shiny coasters.

_abox · on Dec 15, 2021

There's special archive Blu-Rays (M-Disc I believe) which are meant to last 100 years.

Not sure if they really do. We won't know for another 90 years or so :)

ComputerGuru · on Dec 15, 2021

Yeah I’ve written about those on here before. I bought some but haven’t had time to use them; I’ve read anecdotal reports of them not lasting any longer than regular HTL (instead of LTH) discs.

snerbles · on Dec 14, 2021

> or maybe like Parchive

par2 even has options for specifying level of redundancy. I've had good experience in recovering large corrupted files from an external drive - since then, I've incorporated it into the automated backups of my personal infrastructure.

https://github.com/Parchive/par2cmdline

Aerroon · on Dec 14, 2021

Tape, at least right now, seems to have a pretty steep entry barrier though. Tape drives cost thousands of dollars and each tape itself seems to be fairly expensive too. About $50 for an LTO-7 tape that's 6 TB. That's $8-9 per TB just for the tape. HDDs seem to be around $20-25 per TB.

Need to 300 TB of capacity?

50 tapes at $50 totals $2,500 + a few thousand dollars for the tape drive.

Or you could buy about 12x 18TB external drives for about $450 each totaling $5,400. About the same.

selfhoster11 · on Dec 14, 2021

For home use, you can probably get an LTO4 setup going for less than $250 (including some used tapes). LTO-5 will remain write compatible with your tapes and LTO6 will remain read compatible, so there's a forward migration path that will keep things cheaper.

bluedino · on Dec 14, 2021

Quantum's marketing for their BigFoot 5.25" drives was interesting.

They argued that a 3600 RPM drive was as fast as 5400 RPM 3.5" drive because you're spinning a larger diameter platter. Technically true.

They also claimed that since you could fit more data on a track, since it was larger, you would do less seeking, and that's what makes mechanical hard drives slow.

This didn't stop BigFoot drives from being slower than 3.5" drives that were 3 years older and 1/4 of the capacity.

h2odragon · on Dec 14, 2021

I loved my bigfoot drives. Yes they were slow, but they was cheap and ran cool and (excepting certain lines) proved amazingly durable. As in, "ran as the mail spool disk for a decade" for a system sending ~5k emails a day. That one i think i still have on a shelf here somewhere with a "well done thou faithful servant" postit note

bluedino · on Dec 14, 2021

In my time doing tech support and repair, they were probably the most replaced drive to failure that I ever experienced. At least until the Deathstars came out.

h2odragon · on Dec 14, 2021

I think i was buying refurbs then. as i recall the issue was with the controller boards. i know a bunch got used as cheap system/boot disks on alphas at CGI render farms, too: some of my stock came from surplus "Titanic" render nodes. I recall they had special cases for them; 24 pack pink foam crates. I had at least two of the crates...

bluedino · on Dec 14, 2021

Compaq must have been a big buyer of them at the time - they used them in quite a few Presario models (home models, cost cutting I'm sure)

_abox · on Dec 15, 2021

Yes I remember those foam crates. 10% of a new crate of Bigfoots were DOA right out of the box. But I guess the ones that lasted more than a week were much more likely to last years.

Lramseyer · on Dec 14, 2021

> Why stop at 5.25in drives? Let's go back to washing machine sized units or even bigger. Let's stack platters so dense and heavy that we can use the spindle as a flywheel UPS system too.

Current HDD tracks are around 50-60nm wide, and their limitation is not magnetic grain density (those are around 10-12nm for non-HAMR/MAMR substrates and even smaller for HAMR/MAMR substrates.) The limitation is the long (relatively speaking) actuator arm trying to precisely stay on a very narrow data track. There's actuator flexure, disk flutter, and aerodynamic noise, all of which increase with increasing platter radius. This increases your minimum track width, and minimum disk thickness/minimum disk spacing and end up defeating the purpose of bigger disks in the first place.

Also it's important to keep in mind that in order to manufacture a drive, the entire drive is written to and read back multiple times, which takes an increasingly long time, creating a huge push for some steps to be combined.

That all being said, I'm not entirely sure where the sweet spot in terms of data density per unit volume. It might be slightly more efficient with 5.25 drives. But I can assure you that the record sized or washing machine sized HDDs of olden days are not practical anymore ...though they would be dope if they did exist!

wongarsu · on Dec 14, 2021

Tape is great for large-scale setups, with the cost all being in the drives and media being dirt cheap. But what if I just want a backup of my personal or small business's data to put into a safe. Even regular hard drives are far more economical than spending a couple thousand on a previous-generation LTO drive, or many hundred on an even older model.

ComputerGuru · on Dec 14, 2021

I was going through that same dilemma but they’re really not perfectly comparable options. Drives are more sensitive to storage and handling damage than a tape cartridge is.

selfhoster11 · on Dec 14, 2021

An LTO4 drive is capable of writing 800GB raw capacity cartridges, and it can be had for ~$120.

wongarsu · on Dec 14, 2021

LTO4 cartridges go for about $10-20 each. At about $15 they break even with hard drives in price per GB. If you were willing to go with used hard drives the break even point is closer to $10 per cartridge.

It's useful if you get a good deal or if you like the unique advantages of tape (smaller medium with decent shelf life), but otherwise the price advantage seems dubious.

jasonhansel · on Dec 14, 2021

I'm curious as to whether there would be a market for a new, more modern form of tape-based storage, one that incorporated ideas from modern HDD design, had a lower barrier to entry, and/or was based on open-source hardware outside of an existing proprietary ecosystem.

MisterTea · on Dec 14, 2021

I understand that this is enterprise storage related but can we just get cheap optical storage back? My DVD drive can read CD's from the early 90's I used in my 1x CD-ROM drive with caddies in my 486. Those same CD's would also work in a Blu-ray drive. That's true backwards compatibility.

I want a 1TB optical drive with ~100MB write speeds which supports incremental writes. I know I'm asking for a lot but if you could give me the ability to buy a 1TB disc for $1 or less I'm all over it. Archival discs that last for 50+ years would be a huge bonus as well. Perfect cold storage solution.

wongarsu · on Dec 14, 2021

I doubt we will get another generation of optical storage after the current Ultra HD Blu Ray. The movie industry doesn't really need more and is moving to streaming anyways, and video game consoles are adequately served by them and are rapidly moving to digital-only. By the time storage requirements in either industry move beyond what Ultra HD Blu Ray can support they won't be selling disks anymore. And without a major consumer industry driving demand there's little hope of reaching an attractive price point.

MisterTea · on Dec 14, 2021

The thing that really sucks about optical media is that it has been controlled by the media cartel meaning hollywood instead of tech companies. So were screwed out of a viable digital storage media because of greedy assholes.

I'd buy a BLu-ray drive RIGHT NOW if the media didn't cost $66 for a 5 pack of 100GB discs or $90 for a 50 pack of 50GB. That's more per GB than SSD or spinning rust. If the cost of the discs was 1/10th I'd already own one.

pradn · on Dec 14, 2021

Blu-Rays have become specialty items, like vinyl. Even most big-budget movies don't seem to have 4k Blu-Rays, let alone smaller movies. And there's now more and more direct-to-streaming movies that will never get a physical media release; think Netflix.

ThatMedicIsASpy · on Dec 14, 2021

And this prevents it being available in good quality. Streaming is not good quality.

selfhoster11 · on Dec 14, 2021

This is already a problem with some artists where I literally cannot purchase anything higher quality than 128k Amazon MP3.

pradn · on Dec 14, 2021

I'm totally with you. I like buying Blu-Rays for especially beautiful films.

ksec · on Dec 14, 2021

Yes they could do 50Mbps or even 100Mbps 4K / 8K with IMAX encoding. That could easily use up 300GB of space. It needs to be super high quality to attract customers. If Netflix or Apple could stream a 20Mbps file what is the point of Physical Disc?

Wohlf · on Dec 14, 2021

This kind of exists in LTO tape drives, typically used in businesses/government due to retention requirements. They definitely aren't cheap setups though.

xyzzy21 · on Dec 14, 2021

And that really the point: look at most of the media solutions of the last 50 years and the most successful were "subsidized" by a larger consumer or business market. otherwise the cost is "pure" (1:1 R&D to Revenue) and very expensive.

TiredOfLife · on Dec 14, 2021

Do you keep those discs in vacuum in constant temperature and with no light access?

throw0101a · on Dec 14, 2021

More interested in a linked article:

> Wikibon argues the cross-over timing between SSDs and HDDs can be determined using Wright’s Law. This axiom derives its name from the author of a seminal 1936 paper, entitled ‘Factors Affecting the Costs of Airplanes, in which Theodore Wright, an American aeronautical engineer, noted that airplane production costs decreased at a constant 10 to 15 per cent rate for every doubling of production numbers. His insight is also called the Experience Curve because manufacturing shops learn through experience and become more efficient.

[…]

> “Wikibon projects that flash consumer SSDs become cheaper than HDDs on a dollar per terabyte basis by 2026, in only about 5 years (2021),” he writes. “Innovative storage and processor architectures will accelerate the migration from HDD to NAND flash and tape using consumer-grade flash. …

* https://blocksandfiles.com/2021/01/25/wikibon-ssds-vs-hard-d...

Interesting prediction.

ksec · on Dec 14, 2021

LOL this remind me of Anandtech comments ( and majority of internet comments )in 2015 / 2016, "HDD will be dead by 2020". They were still repeating the same thing in 2018 / 2019. And we are now closing to 2022, and it is far from dead if not growing. ( I still dont understand why most people cant read roadmaps )

WD develops both NAND ( via SanDisk / Toshiba ) and HDD. They know the roadmap of both HDD and NAND. There is nothing on the current NAND roadmap which suggest we get another significant cost reduction. As much as I want to see 2TB SSD below $99. I would be surprised if we could even get to that point by 2024. Today a portable 5TB HDD cost $129, ( or $109 with discount )

This is similar to DRAM, we might get faster, higher efficiency DRAM. But we are not getting any cheaper DRAM. The price of DRAM / GB in the past 10 years has had the same price floor.

HDD is in similar case, it is near the end of the S curve.

JaimeThompson · on Dec 14, 2021

Flash has to be powered so the contents of each cell can be refreshed so data isn't lost which make it a less than optimal solution for long term archival storage or offside backups. Simply storing the physical media is easier than maintaining and keeping secure systems to plug the flash devices into.

throw0101a · on Dec 14, 2021

> Flash has to be powered so the contents of each cell can be refreshed so data isn't lost

That's not how flash storage works. You can unplug a flash drive and put it on a shelf and it will keep its data. Here's a review of portable, external SSDs that don't lose their data when not plugged in:

* https://www.tomshardware.com/reviews/best-external-hard-driv...

Are you thinking of DRAM perhaps?

nisegami · on Dec 14, 2021

No, they're referring to bit rot on flash storage due to slow leakage of the charge that represents the data. It occurs on long time scales, definitely years. But the more bits per cell (SLC vs TLC vs QLC), the more likely it is.

throw0101a · on Dec 14, 2021

> No, they're referring to bit rot on flash storage due to slow leakage of the charge that represents the data.

Could not the same be said of the magnetism of the bits on spinning rust? What's the shelf life of data on an HDD?

Tapes also have magnetic charge, but are designed to be "unrefreshed" for longer periods of time.

labawi · on Dec 14, 2021

Tapes and HDDs are magnetic. Flash and DRAM is electric. It is easier to create magnetically stable structures (magnets) than non-leaking capacitors.

Flash has a better insulator than DRAM, but it wears out with use and leaks more with higher temperature, densities, less margin with more bits per cell...

AshamedCaptain · on Dec 14, 2021

I remember there was some analysis done on the SSD of an old early model Surface Pro showing that with age the SSD would become slower at _reading_ data that had not been modified recently. EDIT: It was this story https://mspoweruser.com/samsung-releases-firmware-update-for...

The fact that the SSD controller has to do anything at all to read data that was stored only a year ago is a hint that data retention on a SSD requires some (powered) effort.

wtallis · on Dec 14, 2021

Slower read speeds on old data is indeed a symptom of the SSD controller having to use its slower and more thorough error-correction mechanisms to recover data. However, since the first few generations of Microsoft Surface machines, the SSD market has made several relevant changes:

- a switch from storing charge in a conductive floating gate structure to storing charge in a non-conductive charge trap layer (Intel's flash is the one holdout here)

- a switch from planar to 3D fabrication, allowing a huge one-time increase in memory cell volume and thus the number of electrons used to represent each bit, and also opening up avenues of scaling capacity that don't require reducing cell volume

- dedicating far more transistors to error correction in SSD controllers, greatly reducing the performance impact of correctable bit errors but also enabling the use of more robust error correction codes

JaimeThompson · on Dec 14, 2021

Depending on the design of the flash, the storage temperature, and other factors that data can rot.

Please see [1] for one source.

[1] https://www.ibm.com/support/pages/flash-data-retention-0

selfhoster11 · on Dec 14, 2021

That's what I'm thinking. If flash-based archives require servicing to keep their data, there are systems that provide far better trade-offs for similar amount of bother.

edoceo · on Dec 14, 2021

Can you put them in a battery powered box, with no data connection? I'm thinking my little consumer grade APC could power one for many days (months?). I'll have to check the numbers

wtallis · on Dec 14, 2021

You don't need to keep SSDs powered continuously. You just need to power them on periodically and read all the data on the drive so the SSD controller has the chance to notice data degradation and repair it before it becomes unrecoverable. This does not need to be done more often than once per year. For SSDs that are exclusively used for archival and thus never approach the end of their rated write endurance, it may be adequate to do this once per decade. Basically, any backup strategy that includes a sensible amount of testing and verification should suffice.

JaimeThompson · on Dec 14, 2021

IBM, Samsung, Intel, and other manufactures of flash and flash based products disagree with you on that only once per year time frame. As flash gets more and more dense leakage from the cells becomes a much more difficult problem to solve.

wtallis · on Dec 14, 2021

Last time I checked, those companies were still adhering to the JEDEC standards for how they define SSD write endurance, which is based on how worn-out the flash can be while still meeting certain unpowered data retention thresholds. For consumer SSDs, that threshold is one year unpowered and stored at 30°C. [1] All of the challenges of unpowered data retention are already being addressed in the normal course of bringing new generations of flash and SSDs to market, even though nobody is yet making drives specifically tailored for that use case.

Additionally, lightly-used flash has error rates that are orders of magnitude smaller than flash that has reached or is approaching its write endurance limit. Which is why an archive-only SSD can very reasonably be expected to provide unpowered data retention far in excess of one year.

[1] For enterprise drives, the specified duration is shorter but the storage temperature is higher. The warrantied write endurance is also typically higher, so those drives are willing to take their flash to a more thoroughly worn-out state.

selfhoster11 · on Dec 14, 2021

At this point, you might as well just invest in a tape library

KingMachiavelli · on Dec 14, 2021

If the performance difference significantly price per terabyte then I would certainly use these. As long as the actual sequential throughput isn't terrible they should be fine for media storage.

That said Windows and Linux could certainly get some polish when it comes to accessing high latency storage. Opening a high-latency or even missing network share on Windows (still?) causes explorer.exe to hang.

Linux might handle multi-second IO OK but I have had cases on extremely bad flash USB drives (hacked a bunch of write-once drives) where the IO is blocked for minutes after the write appears to have finished.

In any case it would be really could to throw bcachefs on a HDD drive like this with a cache SSD.

wongarsu · on Dec 14, 2021

> Opening a high-latency or even missing network share on Windows (still?) causes explorer.exe to hang

It's not like Windows doesn't have good APIs that allow you to do better. explorer.exe (or at least the file explorer part) is just an objectively bad product. It wasn't that long ago that it didn't allow you to create files starting with a dot, despite that being an entirely legal path name. And it still doesn't support long paths or alternate data streams, making it pretty easy for software to create files that you can't view in explorer.

I'm kind of surprised that it doesn't seem to have any popular alternatives, considering how consistently bad it is (outside of small subsets, like compression or file copying)

cryptonector · on Dec 14, 2021

Yes, this is really surprising. You'd think this is a pain not just for users, but for users at Microsoft.

olyjohn · on Dec 14, 2021

There used to be different windows "shells" that you could install and replace Explorer with. I would guess that these are "critical system files" now and Windows would not let you replace them like you could back in the day. Probably "security something blah blah..."

lloydatkinson · on Dec 14, 2021

I believe you still can yes

boilerupnc · on Dec 14, 2021

I find DNA storage plays really interesting. Will be fascinating to see what Microsoft[1], Iridia [0], etc ... solutions evolve into.

"two major problems of traditional storage devices: data density and durability. One of the densest forms of storage is tape cartridges, which house about 10GB/cm3. Iridia is on a path to having a storage device that could store 1PB/cm3 and reduce latency by 10–100 times compared to magnetic tapes. The other problem that Iridia is solving is durability. Rotating disks tend to work for 3–5 years, while magnetic tapes for 8–10 years. Because DNA is extremely durable, Iridia’s technology on the other hand has an estimated half-life of more than 500 years."

[0] https://outline.com/a97xFX

[1] https://www.microsoft.com/en-us/research/project/dna-storage...

howdydoo · on Dec 14, 2021

Interesting. I wonder how fast you could theoretically read/write data stored in DNA. I could imagine it being used in place of tape storage, maybe, but it seems like it would be extremely fragile if you try to read through it at hard drive-speed.

EDIT: according to some random website, the human genome contains 3.2GB, and it takes a cell 24 hours to divide. That works out to 37KB/s, which is not very promising.

rwmj · on Dec 14, 2021

Intuitively this sounds wrong because hard drives can only scale the area of magnetic surface to fixed overhead by a certain amount -- literally you can't fit more than a certain number of platters into a 3.5 or 5.25" case. Whereas tape can scale that number far more just by using larger reels and/or a tape library.

selfhoster11 · on Dec 14, 2021

What's wrong with double or triple-height drives?

rwmj · on Dec 14, 2021

They don't fit in existing bays.

But even if we define a new standard of (say) a hundred 12" platters in 12" high cases, the fixed infrastructure (r/w heads, arms, case, board) does not scale as well as tape.

labawi · on Dec 14, 2021

Doesn't each platter side have its own r/w head? These days even a microactuator (maybe shared by adjacent plates?).

selfhoster11 · on Dec 14, 2021

Couldn't multi-actuator drives alleviate the issue?

wtallis · on Dec 14, 2021

Multi-actuator drives would be moving in exactly the wrong direction for reducing those costs. A dual-actuator drive is essentially two drives sharing the same spindle—and the same case height, thereby cutting the number of platters per actuator in half.

rwmj · on Dec 14, 2021

https://blog.seagate.com/craftsman-ship/multi-actuator-techn...

Looks like interesting technology but I don't see how that helps here.

pmlnr · on Dec 14, 2021

I'm still waiting for that 360TB 5D Optical Data Storage ( https://en.wikipedia.org/wiki/5D_optical_data_storage ) promised many years ago, albeit not by WD.

jimis · on Dec 14, 2021

Now that SSDs are ubiquitous, hard disk drive manufacturers should up their game:

  * 5.25'' drive  which is taller (more platters) and wider (more sectors per platter)
  * Slow rotational speed to reduce consumption and vibration
  * SMR again, but label the products accordingly
  * Small (64-128GB) SSD embedded, acting as transparent cache especially for quick response to write commands.
    * Possibility to disable this caching layer with a SATA command.

smarx007 · on Dec 14, 2021

Gorakhpurwalla said: “I think even if you go beyond those tiers, all the way down into very little access, maybe write once read never, you start to get into a medium that still exist as we go forward in paper or other forms, perhaps even optical.”

https://devnull-as-a-service.com/ there you go, robust support for the write once read never use case.

daviddever23box · on Dec 14, 2021

Would hyperscalers jump at a 72 TB 5.25” HDD? Maybe-though the larger issue is that de-duplicated, layered large-file storage (e.g., VMs, whole-disk images) still requires warm storage for the base layers.

Might be an excellent choice for the Iron Mountains of the world, especially for long-form media storage, though I think that the majority of personal long-term storage is actually shrinking, in terms of growth rate.

notacoward · on Dec 14, 2021

You might be surprised at how much of hyperscalers' data is really cold, and also has very weak access-time requirements. For every user-visible gigabyte, there are many gigabytes of secondary data like old logs or analytics datasets that are highly unlikely to be accessed ever again - and if they are, waiting minutes for them to be loaded is OK. And the trend is accelerating since the days when Facebook kept the entire Blu-Ray market afloat to deal with the deluge. I think there's quite significant appeal in the hyperscale/HPC markets for super high capacity disks, even if they're really slow (so long as they're not slower than tape).

Background: I used to work on a very large storage system at Facebook, though the one most relevant to this discussion belonged to our sibling group. I've also missed any developments in the year-plus since I retired.

toomuchtodo · on Dec 14, 2021

Would be relevant for folks like Backblaze and the Internet Archive, where you write once, read many, but rarely delete. 60 72TB drives gets you 4.3PB per chassis/pod, and assuming 10 pods to a rack, 40PB racks. For comparison, 3 years ago, the Internet Archive had about 50PB of data archived and 120PB of raw disk capacity.

https://news.ycombinator.com/item?id=18118556

cameron_b · on Dec 14, 2021

And as "rarely" approaches zero ( think legal hold-type "you may surely not delete" ) there is a cost-saving in warm-ish storage in terms of replication and maintenance. Ensuring that your Tape archive is good is a pain unless you have huge tape robots - https://www.youtube.com/watch?v=kiNWOhl00Ao

mlyle · on Dec 14, 2021

Note he speculated about a 5.25" form factor drive. You're not going to fit 60 of those in something Backblaze-pod sized.

toomuchtodo · on Dec 14, 2021

Looks like 60 5.25" drives from their website?

https://www.backblaze.com/blog/open-source-data-storage-serv...

https://www.backblaze.com/blog/wp-content/uploads/2016/04/bl...

Edit: My mistake! I was confusing 5.25 form factor with 3.5 :/ much shame.

labawi · on Dec 14, 2021

Those are 3.5"

EricBurnett · on Dec 14, 2021

Hyperscalars use a blend of storage flavours covering the whole spectrum, and for most data-heavy purposes can mix hot and cold bytes on the same device to get the right IO/byte mix. At which point you can simplify down to _"are they currently buying disks to get more bytes or more IO"_ - if the HDD mix skews far enough that they're overall byte constrained, yeah they'll be looking to add byte-heavy disks to the pool. If they've got surplus bytes already, they'll keep introducing colder storage products and mix those bytes onto disks bought for IO instead.

throw0101a · on Dec 14, 2021

> Hyperscalars use a blend of storage flavours covering the whole spectrum

Probably including taping, which most non-enterprise folks are often surprised still exists.

There's an upfront cost for the infrastructure (drives, usually robotic libraries), but once you get to certain volumes they're quite handy because of the automation that can occur.

EricBurnett · on Dec 14, 2021

Tapes are awkward though, since they can't directly satisfy the same random-access use-cases. E.g. even GCS's 'Archive' storage class, for the coldest of the cold, offers sub-second retrieval, so there's at least one copy on HDD or similar at any time.

Tapes are suitable for tape-oriented async-retrieval products (not sure if any Clouds have one?), or for putting _some_ replicas of data on as an implementation detail if the TCO is lower than achieving replication/durability guaranteed from HDD alone. But that still puts a floor on the non-tape cold bytes, where this sort of drive might help.

MayeulC · on Dec 14, 2021

I wonder if a "minidisc"-like format if feasible, where you would insert a protected platter inside of a harddisk?

I'd be willing to use that over huge HDDs. Give me 1TB platters at 5-15€/platter (consumer HDDs come close to €18/TB for large capacity). Actually, I wouldn't mind having them more expensive than HDDs per TB, as I wouldn't have to pay €250 at a time for bulk capacity upfront.

nayuki · on Dec 14, 2021

https://en.wikipedia.org/wiki/Jaz_drive

riobard · on Dec 14, 2021

Optic disks work that way, but highly unlikely for hard drives as the internal chamber requires super clean air without any dust. Latest hard drives over 8TB mostly have adopted helium-sealed chamber, making removable platters even less feasible.

selfhoster11 · on Dec 14, 2021

They use helium mainly because of data density. If removable platters required less precise heads, you could still use air.

ahupp · on Dec 14, 2021

I wonder if it's feasible to make many sets of platters share a single set of heads? Similar to tape, but once the heads are mounted to a specific set of platters you get disk-like latency. Keeping the environment inside the drive clean (or filled with helium) seems like the challenge.

mikewarot · on Dec 14, 2021

Loading/unloading the heads when moving to the next platter would be the dangerous/slow part. It's easier to keep all the heads flying, and move them in parallel.

cryptonector · on Dec 14, 2021

One of the problems with HDDs for long-term backup is that they need to be spun from time to time else they might not start up when you need them. Or, at least, that's what I remember being told fairly long ago. Maybe that's no longer the case?

ComputerGuru · on Dec 14, 2021

Modern disks have read/write heads that don’t magically seize from lack of use any more. SSDs are way worse here: if it’s not powered up (and initialized), the individual NAND cells lose their charge and you’ll fairly quickly get some serious bitrot (especially with TLC and now QLC being the predominant options leaving very little room for analog drift). You can no longer throw an external drive into a safe and leave it there for a couple of years like you could with 2.5” or 3.5” spinning rust.

wheybags · on Dec 14, 2021

The article is interesting, but it reads like it was dictated to a speech-to-text system.

abricot · on Dec 14, 2021

"right once read never"

Sitio · on Dec 14, 2021

Still waiting for something cheaper than buyingbackup hard drives for 10tb of data.

I nearly calculate the cost of using tape every year and it's just not end-user useful :-(

The drives are too damn expensive and probably not that nice to use.

Khaine · on Dec 14, 2021

After all the shenanigans they pulled with SMR disk drives, the only idea I have for the boss of Western Digital, I can't say on hacker news.