Hacker News new | past | comments | ask | show | jobs | submit login
Petabyte tape cartridges are coming (blocksandfiles.com)
166 points by TangerineDream on Feb 5, 2021 | hide | past | favorite | 151 comments



Assuming you can read/write at 512 MB/s, it would take 24-days to read/write a full Petabyte.

Despite the long time to read/write, any situation where capacity is king (aka: any Tape-drive solution today), probably benefits from something like this.

This isn't a solution where you backup / restore from per se, but instead provide dozens of backups to (and probably only restore once from). Every day, you backup up to 42TBs of data (512MBps going full tilt for all 24-hours).

Then, when your backups fail, or someone asks for data from 3 weeks ago, you rewind the tape and grab that data. Crazy that this would all fit on one tape potentially.


> Assuming you can read/write at 512 MB/s

I realize that's just an assumption for the sake of discussion (and thank you for labeling it as one), but I think it's a little on the low side.

Wikipedia says (https://en.wikipedia.org/wiki/Linear_Tape-Open#Generations) that LTO-9 can write at 400 MB/s native and LTO-10 is going to be 1100 MB/s.

Since this new Fujifilm format is denser, I would expect it to be even higher speed. Although speed and density don't have a simple linear relationship, speed generally increases with density.

Generating enough data to keep the tape moving without a buffer getting empty is an important consideration, but I think this product is mostly aimed at people who have enough data to back up that they can fill a tape in a few hours. (Otherwise, why spend the money on cutting edge technology?) So they probably have lots and lots of machines to back up, and you can stream backups in parallel, writing to a staging area and/or multiplexing onto the tape.

Also, there's some risk if you leave a tape in the drive and append to it over the course of several days. While it's still in the drive, it's not quite an offline backup. The drive could malfunction and chew up the tape, or some software could accidentally rewind and start overwriting.


I decided to invest in an LTE drive to periodically archive my ZFS based NAS. This relatively old drive support 50-150 MB/s with a small internal buffer. My biggest problem ended up being that the default tar util on ZFS on spinning rust cannot even saturate the 50MB/s consistently, so the drive had so stop and rewind (called shoe-shining). It would take ages to backup parts of my filesystem with lot of small files (i.e. git repos).

I ended up writing a custom software with an adaptive scheduler to avoid sudden speed changes.

That is to say as far as I am concerned my ancient LTO6 drive is too fast...


I would highly recommend jacking up the --blocking-factor on tar (or your custom software) as high as your drive will accept. AFAIK the default on tar is 512b which is utterly inadequate, and even sequential data on a fast SSD will shoe-shine when written directly to tape.

AFAIK most large businesses wind up writing to disk and then queuing the resulting serialized virtual tapes for actual writing at a later time. This basically turns the virtual tape library into a massive write buffer. It's absolutely insane that we have to do this - but a lot of archival utilities aren't built for async/parallel IO and thus aren't taking advantage of modern storage devices.


The tool I ended up writing does essentially something very similar:

- It uses a block size that matches the disk

- It uses 50% of my RAM as a ring buffer

- On start (and on buffer underrun) it waits for the buffer to fill before it starts writing to tape.

- It adjust the tape write speed as an exponential function of the buffer fill level. I am proud of this last one. It is very simple and automatically converges to the best average speed to keep both the tape speed and the buffer level relatively stable.

I have been considering making the reading multi threaded to make use of more disks, but that would be a bit of an overkill for my personal 2 disk NAS.

edit:formatting


Sounds like you reinvented mbuffer.

http://www.maier-komor.de/mbuffer.html

Why not publish yours as well?

For other file systems (not ZFS), spd_readdir.so can offer a good performance boost.

https://kernel.googlesource.com/pub/scm/linux/kernel/git/djw...

It's probably possible to do the same on ZFS, but CoW certainly makes realizing this concept more difficult.


- How many TB are you backing up?

- How frequently do you perform backups?

- How often do you do fulls vs incremental?

- What drive/autoloader are you using, and what was the total investment in drive + tapes?

I keep considering a used LTO but I've only got 20 TB and it doesn't seem worth the trouble. I should just buy another 20 TB of drives and keep them offline when not backing up.


I only have ~1.5TB stuff that is really important which fits on a single LTO5 tape. (I said LTO6 before, I was mistaken) I archive every ~3 months, generally before I make bigger changes. I always do a full tar archive, it typically takes the whole day, definitely an event my family is aware of. I use a desktop HP Ultrium LTO5 drive. I fit on a single tape, but even if I didn't, changing tape twice a day would not be an big ask.

Months of researching, ~$900 total on 2 used drives (first one was broken) + enclosure + SAS controller + 5 tapes.

You need the enclosure with forced air cooling, you can't put the bare drive in a PC tower, it will overheat.

I cannot imagine archiving 20TB with my cheapo setup, but I also cannot imagine spending ~3K for a newer drive, this is a sweet spot for me. After buying a broken drive first, I am now always nervous that this drive will quit on me. It is definitely a passion project at this point.


Thanks for the info. For 1.5 TB I'd just use an extra drive.

My 20 TB is 11 x 3 TB used SAS drives in a raidz3 config. You can pickup a 3 TB drive off eBay for $18-$24.

I should've probably just bought four 10 TB WD red drives, but building this system has been, uh, a passion project I guess. :-)

It's still cheaper than if I'd bought new red drives, but I haven't factored in the electricity cost of running 11 SAS drives vs four red drives... the breakeven is probably less than a year. :-(


I back up around 20TB and the easiest solution I found was to just buy 4x12TB drives and mirror to cloud (yielding 2x offline copies). For stuff that's really really important, around half of that, and which if restore is needed fast access is required, I also back it up to SSD.

It's not the greatest solution, but when you're hovering at some small multiple of common single drives, it's certainly the easy answer.


Tar not working for modern tapes by default is pretty ironic.


Not necessarily. Tar was invented way back, in early Unix days. I know it would have been updated sometimes, in GNU tar and other Unix versions, but they may not have anticipated the massive growth in tape capacities and speeds. Using dd with a high block size to write "virtual tapes" (as described elsewhere in this thread) from disk to tape might be one solution (not tried, just had the idea now).


It's been best practice for a while in the 'enterprise' realm to do disk-to-disk backup first so that the streams are all sequential and then copy to tape. Enterprise-y software can also generally multiplex multiple clients to a single tape.


I used Amanda Backup[0] for years on a Unix farm, and that's exactly how it works. Client apps on all of the machines being backed up ran tar and streamed the output to a backup server where it was written to disk. Then the completed files were streamed out to the tape.

[0] http://www.amanda.org


I don’t understand how this scales. If you have 1 PT to backup to tape, your first have to write it to disks on a backup server? So you have to buy 1PT worth of backup disks first, is that correct?


You'd never buy a 1PB tape drive if you only had 1PB of data to backup.

A 1 PB (1000TB) tape drive would probably cost $10,000+ to $100,000. The tapes themselves would be cheap (cheaper than hard drives or SSDs). So you'd buy many tapes, perhaps 50 of them, to be cost-effective.

That's why a lot of comments around here are talking about tape libraries (entire boxes of tapes). You must plan to use more than 50 tapes (aka: 50PBs) before you reach any level of cost-effectiveness.

---------

From there, we see that 1PB of data is simply "infrastructure", to help you lay out the data before it gets to the tape. There are 20TB hard drives today: a single machine with 50 x 20TB hard drives (and maybe some flash storage to accelerate the write to hard drives) is what we're looking at to feed the Tape Drive.


When I was a kid and we had to do a school internship, I went to my dad's company. Since we were just school kids, they didn't really give us anything to do. Except: Loading and unloading tapes and moving tape lists from the main tape library room to the backup safe. The backup safe was several hundred meters away in a different building (on the same lot).

This was basically their WORM archival system. The tape library was a room full of tapes, wall to wall with shelves in the middle. Those were the oldest tapes. The in-use tapes were inside "Robbie", the circular tape robot. Not sure it it's this model exactly, but it sure looked like this: https://bjgreenberg.files.wordpress.com/2012/05/tape-library...

The tape robot would periodically spit out tapes on one end that we'd collect and then file away in the correct location on the shelves and vice versa, got a list of tapes to get from the shelves and insert into "Robbie". I don't remember typical tape sizes at the time, except that obviously it was much more capacity in a little tape than I had in my hard drives at home at the time :) Now imagine, Robbie filled to the brim with these 1PB tapes...

Robbie and his extended tape library was in a room right adjacent to the actual data center floor, which had an IBM 390 mainframe and lots and lots of EMC^2s, AIXs, Suns and everything else you can think of being in a data center ca. 2000. You obviously had to get into and out of the data center through a single person entrance gate (a vertical tube that literally only one person could fit in, opening one side at a time only). First day I'd just walk through the data center and look at everything, reading labels and such. Found the server that all my personal emails were going through, as my ISP was hosting there! Heaven for a kid like me.


You wouldn't usually backup the whole thing first. That 1 PB is probably a bunch of smaller projects, so you'd create a tar of the first project, then flush it to tape. Tar up the second project, then flush it to tape.

It's not so bad if you're writing 23 tarballs out to tape and pausing between each one. What you really want to avoid is where the tape drive is pausing many times during a single archive, perhaps because tar can't keep the drive fed while it's traversing a big directory of teensy files.


Sure, why not? It's an incremental cost of the backup system. Look at the current generation stuff. An LTO-8 drive puts 30TB onto a tape, but that's assuming 2.5:1 compression, so the native capacity is only 12TB.

Let's assume you're actually going to do the compression as you build the virtual tape. What's a 12TB hard disk cost these days? A lot less than the $4,000 that the tape drive costs. Heck, 12TB of SSD is going to be cheaper.


60x18TB drives and a server to put them in would cost about US$ 45K:

* https://www.45drives.com/products/storinator-xl60-configurat...

That gets you about 1PB raw.


It’s not exactly best practice, but instead of tarballing the files themselves maybe you would have better performance just writing a snapshot via zfs send direct to tape?


I admit to have thought of that, but using a zfs stream for archival would make eventual restore that much more complicated. A single bit error at the beginning of the stream would make the rest of the archive un-accessible, whereas with tar I can literally slice the tape and still recover data from it.


You can now tell ZFS to still process bad snapshots (with the risk of still losing some corrupted data).


long ago there was a utility called `buffer` that solved that problem. i know i've seen a "buffering netcat" go by here recently, that should serve too.


LTO speed has stopped doubling with LTO-4 - it's up to 400 MB/s uncompressed - which by the time LTO-15 is out it might be around 1GB/s or more.

Still a long time to write.


Tape libraries often have multiple drives and it's possible to write parallel to them.


Tbh, I think that in the near future we'll figure how to flip one bit worth of information in a single atom, perhaps with a high precision magnetic field, and use crystals to store amounts of data we don't have prefixes for yet.


Doubtful this is useful because it’s just not enough mass to be stable.


Not anywhere near room temperature, anyways. Plus it's not particularly helpful to store data in a single atom when it takes a much larger device to read the data. NAND flash cells are already about as small as they can be, but now they're being stacked on top of each other for better density. This is a much more promising avenue. Holographic storage also leverages the third dimension for huge increase in density compared to traditional optical drives.


What do you mean? I thought atoms in crystals are well localized and only oscillate around their average grid position.


I mean that noise in the environment will interfere with the state of your atoms too easily.


The state I was thinking about is stable electron configuration. If an atom has two such configurations, and can switch between them, that's one bit worth of storage.


And what if a stray photon should strike your atom? Does that not have a chance of changing the state? I don’t know much about electron configurations.


I suppose a random photon won't have the right energy to switch the state. But even when this happens, we have error recovery algorithms: cells in regular hard disks die all the time.


A single atomic electron state will almost certainly switch to a problematic degree except for very exceptional materials in typical room temperature type conditions - free ozone, UV light, background radiation. most crystals will grow or change at room temperature in some way over time (normalizing to the lower energy state if they weren’t 100% already, or high energy perturbances like UV light damaging the matrix and then ‘falling down’ the energy ladder again). Diamonds ablate through oxidation as well.

Error correcting helps of course, but we already have to do a LOT of error correcting to make even the ‘low density’ spinning rust drives we have now work. At some point you have so much noise, your error correction overhead is too high to make it worthwhile - which is the current issue with high q-bit quantum computers for instance

We’re a long way from having the materials science to be able to do this. Same issues plague quantum computing. Maybe some day though?


MRI machines do something like this. Probably not at the atom scale accuracy. UV light can be ignored because the crystal would sit in a opaque case. A trivial error correction is writing a bit to many places, the idea similar to bloom filters.


If it's sitting in a truly opaque case, it definitely isn't a tape! Shutters aren't perfect, and they open every time the cassette is used. That sounds more like a hard drive? UV and other short wavelength light can come from inside the case or the crystal from background radiation or cosmic rays as well. Reed-solomon encoding, n-copies, they all have capacity tradeoffs and error rate detection/correction limits, and get away from the 'single bit' ideal quite a bit (for n-copies, at least by 2x, usually 3x even in the most ideal situation). It's also non-trivial to avoid something like a background radiation emission 'tunneling' through and causing non-trivial damage to a sector.

It's going to be statistical level analysis for awhile, at least at temps near room temperature or devices only practical in a specially shielded lab.

MRIs are heavily shielded and regularly calibrated, and are only doing bulk statistical analysis. AFM or MFM [https://en.wikipedia.org/wiki/Atomic_force_microscopy] is the closest we can currently get, and they suffer from a number of issues, including damage to what it's touching, repeatability/precision issues, thermal drift, etc. and that is just getting what is usually a one time read.


We're getting closer to writing out the Akashic records. Interesting recursion of our present moments that it becomes observable.


Could that be mediated with many read heads at different locations?


Tapes only have one head. Otherwise, they couldn't wind or rewind.

It'd probably be mediated with a thicker tape with a thicker head: the thicker the tape, the more "data per inch" passes through the head, which allows bandwidth to increase. That would change the LTO-tape dimensions however, so that probably isn't an option.

Fortunately, by shrinking down the size of data (more data per square mm), the bandwidth necessarily increases. So maybe this technology will have a faster read/write speed than today's tech.


I don’t see a reason why you couldn’t have multiple heads. As it stands, in order to write a complete LTO tape, you make multiple passes across the tape. A second write head would allow you to make fewer passes across the tape to fill it.

The data on an LTO tape is written in a serpentine manner, or boustrophedon if you prefer, with overlapping series of tracks separated by the servo tracks used to position the head.


What would happen when one head can't write out data fast enough (for some reason) and the tape has to slow down and back up. Now, instead of one head, you have multiple heads that have to resync and get going again.

I dunno, if multiple heads were worth it, I think it would be common.


> What would happen when one head can't write out data fast enough (for some reason) and the tape has to slow down and back up.

The tape does not rewind in this scenario.

It is fairly common for the system to not be able to keep up with the full write speed of the tape. Just as a reminder, LTO7 has a write speed of 300 Mb/s, and filling a 300 Mb/s pipe for five and a half hours is no small task.

There are two ways the system deals with it. First, the drive will slow down the tape to match the data rate. Second, the drive can mark a section of tape as failed and then rewrite that data to the next part of the tape. Instead of rewinding and writing over a section of tape, you’re using up more tape this way.

Tapes, like hard drives and SSDs, come with a certain amount of extra capacity to allow for this as well as other write errors.

> I dunno, if multiple heads were worth it, I think it would be common.

I’m sure that it’s “not worth it”, yes, absolutely. But I don’t see major technical hurdles here.

One technical hurdle would be figuring out how to, say, fill an 800 Mb/s pipe for six hours straight. That’s hard. It gets harder when you realize that a single tape library might have as many as 64 or 80 tape drives. 64 drives, multiplied by 400 Mb/s, is 26 Gb/s.


If you have a 64 drive tape robot, you probably have 65 computers: one controlling the load/unload mechanism, servicing 64 servers each dealing with one drive. (Or perhaps 16 servers that each handle 4, or 8x8.) Each of them has its own high speed drive for cache and independent NIC.


That's part of the solution, yes, but it's difficult to get the whole system set up this way and achieve something near maximum throughput. My understanding is that most people who use tape don't get anywhere near maximum throughput over the long term. Naive use of drives is still quite common... things like backing up with tar, or dumping a database straight to tape, and it's easy to imagine buffer starvation scenarios.

Returning to the idea of increasing throughput on the drives by adding heads, I'm just doubtful that it would be the best way to improve overall system performance. For alternatives, you can either get engineers to optimize usage of existing drives or buy more drives.


Here's what I want.

A box that has a tape drive and a hard disk. For every disk write my computer does, I want it to send to the box a copy of the data, what disk it was written to, and what sector number.

I want the box to store that information on its hard disk. When it accumulates enough that writing it to tape would not be too inefficient [1], it should write it out to tape and free the space on its hard disk.

I monitored my SSD writes for a few years, to see how long it would be before I had to worry about hitting their write limits. I found that I was writing under 3 TB/year on both my home and work machines.

A box like that, even with a modest size current tape cartridge, would be enough to handle all of my computers for several years before filling up the first cartridge.

[1] I don't know if this is still the case, but it used to be that when tape drives stopped and started, you'd get a gap that wasted some tape.


3 TB/year? Why even bother with the tape at that point, a $300 hard disk can store nearly 5 years and doesn't require a special 2 drive solution.


So, backblaze / dropbox.. but in a physical box?


Presumably much less expensive since cloud storage providers are mostly instant access data stored on HDDs and network connected to the cloud.

Plus, I'm sure writing to tape is slow. But it has to be faster than most people's paltry upload speed (at least in the US).


Tapes might be cheaper than HDDs but the tape drives are obscenely expensive to the point where it only makes sense to buy one if it would replace at least 100 hard drives.


With LTOs you have to kind of migrate every third generation onto new standard as well. Any given generation drive can read two previous ones, and write to previous and current. It's not a forever drive. Also, it's not also as reliable to story only one copy. It adds up, but still better than spinning rust.


Actually reading tapes from two generations back can decrease the lifetime of your drive considerably. So in a larger setup you're probably not going to want to skip as many generations (in part because you can and practically must incrementally upgrade).


What if we used the 1Gbps lines we have to our houses to sync data to a neighborhood level tape back system. We could just swing by the minimart and pickup some beer and our weekly backup on a datacart.


I think you are overestimating people's interest in managing their own data.


It isn't overestimation as much as willfull disregard for reality, if I outline my version of the universe eventually it will become true.


I like it. We don’t get anywhere without this sort of optimism.


> For every disk write my computer does,

You are going to want a high speed caching layer in there too.


The generations so far has been roughly once every 2.5 years. Stretch it out to three years. Nor has each generation provided a doubling. Let's suggest a factor of 1 and two-thirds, to be on the safe side.

We could then expect LTO-10 in 2023, LTO-11 in 2026, LTO-12 in 2029, LTO-13 in 2032, and LTO-14 in 2035, with a final uncompressed capacity of about 234 TB. Note that I have been conservative for both the generations and the increases in capacity. If you squeezed in an extra generation and assumed a full doubling each time, you would end up at 1,152 TB, or the 1 PB predicted by the article.

This looks to be ... right on schedule.

Now if the cost of the drives would only come down ...


Honest question, is it the data-capacity-scale of LTO tape to be the driving factor instead of using a lot of micro SSD for example?

I haven't used LTO tapes in a long while so I am not current on modern backup strategies... but is tape still king in backups?


Tape is the only commercially-available storage medium that can be used for long-term archival storage, e.g. 10 years or more. The tapes are typically specified for 30 years storage time, but it is likely that you will have to transfer the content on larger tapes earlier than that or you might not be able to find compatible drives.

Nonetheless, a 10 year storage time is easily achievable with tapes.

There are no competitive devices. The optical disks have a far lower capacity. Moreover, except those made with gold, which are no longer produced, the metallic mirror will oxidize after a few years and the disk will become impossible to read. The SSDs and other flash-based devices lose the charge after a few years. The HDDs that are not in active use will develop after a few years mechanical problems and they may remain stuck.

There are other technologies that could be used to make memories suitable for archival purposes, but nobody has tried to develop commercial products. The market is small, because most people do not think much about the future so they discover that they should have spent more on archival storage only after some precious data is lost and it can no longer be retrieved.

In my home, I am using LTO-7 tapes to store data whose loss I consider unacceptable. For example, because of space problems, I have scanned a huge quantity of books that I had previously, then I have destroyed or donated the paper copies. Since now I only have the digital copies, which are much more prone to irremediable loss than the paper books, I take serious precautions to avoid any such loss, e.g. for each file I store copies on 3 tapes, which I keep in different locations.


Well, maybe.

We use LTO6 to backup daily 60tb of data from production, so ten tapes a set. 14 dailies, 13 monthlies, and 7 yearlies.

We are probably moving to a Data Domain solution; as in spinning disk; but the key to archival security is that backups are replicated between two sites that are geographically separated. space usage is minimized by compression and logic that can treat newer backups as extensions of existing data rather than duplicating all the data again.

the big advantage is restore speeds. spinning up ten tapes which we write simultaneously because of speed needs is cumbersome and slow with restoring a single table taking nearly half as long as the actual backup. On a DD system its less than 30 minutes and with backups replicated to both centers a restore can be done to production that was saved from the dr site; we replicate from production to dr and only backup dr daily.

the issue with tapes has been you never know when the one you need is gone bad until you need it. it is also not common to replicate tape sets but a disk base solution lends itself to easy replication and transmission from site to site.

so I can see both solutions working side by side for some companies but for many the disk based backup solutions that are out have come down in price; most are leased; as storage devices have dropped.

are biggest limitation still is the number of ports between system and backup devices.


> the metallic mirror will oxidize after a few years and the disk will become impossible to read.

I think NIST or somesuch did accelerated decay studies and found even moderately decent storage conditions for plain DVD-/+R was 30 years, with BD-R in the same conditions being vastly longer than that. I do know that i have 20+ year old CD-R backups* that still work, and as far as i can tell all of my DVD backups from 2002 and on still work. I recently got some 100GB BD-R but haven't written them yet, my laptop doesn't have enough storage for windows to do its thing, and i have 0 idea how to do a bd-r "files" backup on linux. Every google result is "how to back up your blu-ray movies" which is explicitly not what i want to do.

* I also had a DVD-RAM drive in 1998/1999, with the "caddy" cartridge, and probably still have a couple of those discs, unsure if they work/if i can read them.


It is true that there are optical discs that will survive for a long time, because the plastic coating of the metal has a low enough permeability to the oxygen from the air.

The problem is that as an individual, it is impossible to guess whether the discs that you have bought are good and they will last 50 years or they are bad and they will last only 2 years.

There have been many published cases when the lifetime claims made by manufacturers were proven to be false.

The reason is that it is difficult to control the permeability of the coating during production. While many discs may be good enough, there are also many that will have a short lifetime.

The risks of storing data on standard optical discs are too large to be acceptable. There were special archival-quality optical discs with gold reflective layers, e.g. made by Kodak. Those could really be guaranteed for 100 years or more, but they were too expensive, so they were discontinued.


I mourn the death of optical disk, back in the primetime of DVD they were actually viable for backups of personal photo libraries and the likes.

I even had a special printer that would put a custom image on the face of the disk!


Gosh I miss the old MAM-A gold DVDs. Are there none left?


Just geometrically, tape will be king for ... quite some time.

Imagine a HDD platter. You can only write to the surface of it. That's a small surface area, when you think about it. Now imagine the surface area of an unspooled tape ...

Of course, SSDs are a different beast but they are nowhere near as dense.

The spooling and unspooling for the geometric advantage is a tradeoff between time and space. Therefore, tape will continue to reign supreme for colder backup up until such time as some kind of holographic, volume-penetrating technique reaches the sheer density of a wound tape, adjusted by some kind of time factor for the read-write of our hypothetical holocube.


tape is still king of long term or cold backups. most people also keep hot backup on hdd or ssd in addition. but there are usually off prem long term cold storage backups along with these and those are very often tape. these tapes are for like FUUULLL dr and not often used but kept as insurance against the hot short term backups getting hosed


MicroSD cards are nowhere near reliable enough to stripe a petabyte of data across them. At current prices it would also cost around $120,000 + the cost of the readers and the robot to swap them around.


Would there be a market for a smaller pro-sumer version Tape Drive Backup? May be the size of Mini Disc Or Zip Disc at around 5TB.

What are the current strategy for consumer offline backup. Simply copying to another HDD?


There used to be, but that market died. Back in the 90s, some formats were relatively popular with power users and small businesses:

QIC (https://en.wikipedia.org/wiki/Quarter-inch_cartridge)

Travan (https://en.wikipedia.org/wiki/Travan)

Data8 (https://en.wikipedia.org/wiki/Data8)

I think what killed them is economics. The per-gigabyte cost is lower because tape media is cheaper than hard drives, but the initial investment is much higher because tape drives are expensive. The reason tape drives are expensive probably has to do with manufacturing volume but I'm sure also it's because they are mechanically complex. You need to wind tape onto a spool, you need to keep it at the right tension, and there's the loading/unloading of the tape cartridge.

And tape drives have wear and tear (tape physically dragging across the head), and when the drive breaks, it's expensive to repair, which is big drawback for an individual.


Being able to write your own optical media killed off tapes as consumer devices. Now that optical drives are on the way out there might be a place for tapes again ...


When would they be better than flash storage, though?


I have a ton of files I can properly delete, but which I might need back at some point, possibly decades down the line. I would love to have an affordable tape machine that I could copy the files with, then store one tape at home and one at my parents place.

The tapes are affordable, we need the machines to be affordable too.


I would put Flash less Reliable than HDD.

Basically storing Digital Data Securely, Safely and reliably for consumer is just a bag of hurt right now. Companies want to push you all to the Cloud. And I wasn't happy when that direction started, and being increasingly unhappy with it now given how the companies have acted.


For longevity I suppose. Don’t you need to power on flash storage occasionally?


Not just power it on, but also read it. Flash isn't really a good medium for anything other than operational data. It is a good idea to have a write behind mirror from flash to either HDD, tape or highly replicated storage.


Can you produce flash cheaper than magnetic tape now?


When I researched this last year, some people suggested m-disc, but it's >10x the expense of spinning rust. If you're a home-labber, you might want to try buying a used LTO drive, but researching, sourcing, caring, and feeding of LTO is a lot unless it's a hobby project you're doing for fun.

Here are the alternatives I found: <https://photostructure.com/faq/how-do-i-safely-store-files/#...>


I cut my teeth on tape — Travans, IBM3590, IBM3592...

Cold storage is a wonderful thing. A mechanical bridge from the ethereal to the real world with a true disconnect between the medium and the mechanism. I worry about the longevity of our data but some people are doing good things.

Act now! Support your local internet archive!


14 years from now? Call me a luddite, but I'm going to hold off even thinking about this until then. Maybe I'll get one for my retirement party a couple years later...


Same reaction.

That’s a hell lot of time honestly, I don’t what the world will look like in 15 years.

That said 1PTB is really massive amount of data


Why can't we have higher density optical disc formats? Is BDXL at 128 GB really the end of technological innovation?


Previous optical disc formats reached critical volume from content companies.

Now that it's possible, they would much rather get the recurring revenue of you renting access to their content.


They're happy to sell you Blu-Ray discs if that's what you want. Consumers, which is where most of the market is, don't really need higher capacity for anything. And even a 10x increase for BDXL probably doesn't really get you to a capacity that's all that useful for most types of archiving.


You're right that my reason is probably overly cynical, particularly if 128GB is large enough for 2 hours of 8K content. The point still stands that without millions of drives being manufactured, the pricing would force any new optical medium to be expensive.


> And even a 10x increase for BDXL probably doesn't really get you to a capacity that's all that useful for most types of archiving

Sure it would - it would get over the 1Tb point, which would mean most users could back up their devices to a single disc again.


I expect the economics wouldn't work out for most consumers. (And even if they did, how many would actually make regular backups?) And I'd point out that they can backup, even in an automated way, multiple terabytes to about a $100 disk drive today.


Thats not very usefull for an offline backup, Optical is meant to past decades on a shelf in a way HDDs do not. They can be put in a letter, etc.


I would still love to find out if the statement that early AWS Glacier backups were done to BDXLs is true.


It isn't; there's a next-gen archival disc format called Archival Disc with 300GB discs and plans to scale to 1TB/disc. It's only used in cartridge library systems like Sony's ODA, which cost about as much as tape drives do.


2035 is a long way off. I've seen estimates of 1PB on ssd coming in 2023. I honestly don't see the numbers working out for tape if this is the trajectory


Fascinating!

My curiosity is, could functioning transistors somehow be created on either a BaFe or SrFe substrate by a magnetic or other electromagnetic beam of some sort?

Perhaps one of those is the material for doing something like that...

Anyway, amazing storage capacity!


Just curious — who's using these today? Is it mainly for cases where you need to be able to easily physically transfer data?


Not nearly as uncommon as you’d think. Bigcos have all sorts of data requirements. One of my first jobs involved pulling tapes from libraries and sending them off to Iron Mountain every day (and on occasion retrieving and restoring from said tapes).

On the biggest end, you have folks like AWS glacier with an absurd amount of homemade libraries (and generally no tapes stored on/off-site outside of libraries)


I've never seen compelling evidence that Glacier isn't just a different API and pricing policy on top of S3


$0.99/TB replicated with Cold Storage seems pretty compelling to me.


I thought the consensus was that Glacier uses cold storage disk drives. It was speculated early on that Glacier was tape but AFAIK that was never confirmed and there's no published evidence that's actually the case.


I always liked the rumors about it being huge BDXL libraries best. For example this HN entry: https://news.ycombinator.com/item?id=7647571

IIRC Facebook actually built something similar for their cold storage.


I find it a bit surprising that AWS has been able to keep this such a secret, especially given that I would have imagined that big enterprise users would presumably want to know what Glacier actually is before depending on it.


NDAs are wonderful things.


Big companies that have a sizable on-prem data center and need a cost effective way to store backups off site.

At least that was the case at a place I worked ~5 years ago.


Everyone who has lots of data and wants to store it as cheap as possible - tape is still the cheapest medium in $/TB. (Provided you have enough data, otherwise the initial overhead for drives and a robot might pivot the equation towards HDDs)


Insurance companies and banks for one.

Once you've invested in the initial infrastructure (drives, libraries), the incremental cost of extra tapes isn't that much, so you can keep things on the shelf for a long time for not a lot of money.

Send to two tapes for resiliency and every few years to a tape-to-tape copy to stay on the latest hardware. LTO drives can read two generations back: the just-released LTO-9 stuff can read LTO-7 tapes, and also write LTO-8.

Though a lot of backup software supports S3 APIs, so some folks are sending (encrypted) bits to Amazon Glacier (Deep Archive) since they practically will never have to retrieve it.


It's probably more expensive to use disk for what you'd get from a tape robot, considering the worse reliability and shelf life of disks and near-absence of hard disk loading robots on the market.


A former IBMer told me a story about working at some large US government tape library with a tape robot that was essentially a powerful industrial 6-axis arm with sensors on top of treads. It did a good job shuffling tapes until someone accidentally messed up one of it's guides, after which said robot proceeded to crash through a brick wall.


At least it didn't crash through a pile of tapes containing valuable data.


Some types of tape have append-only mode, which is great for things where you want your backups to be immutable and auditable. Imagine a logging server that writes to tape regularly so that the logfiles can never be deleted or altered. That's super handy!


They are still used in VFX a lot.

Its a good way to move entire shows to and from online storage. Its effectily a one off cost, vs a continuous cost for power and cooling. Its also a fast way to move data from one company to another. a ten drive robot is many times faster than 10gig ethernet.

For other companies its attractive because tape is literally offline. Once its out of the robot, there is no automatic way to get it online again. This means that its far harder to tamper with.


Some video production companies use tape for archiving. If internet speed isn't fast, it can be the only realistic and cost effective option.


They’re used by heavily by CERN to archive the LHC data. In ATLAS, we actually have to go back and reprocess that old data sometimes. So there’s dedicated software that runs on the grid to stage petabytes worth of data back onto hard drives, distribute them across the grid, process and then wipe the disks again. There are even tape robots in the data center here :)


An example is the large hadron colider, where each experiment produces an enormous amount of data from all sensors.


That’s not just the LHC - it’s pretty common across high energy physics, nuclear physics, and astronomy, at least.

It’s a pretty great fit for the technology. You keep the most recent data on hard drives for analysis, and as new data comes online, the old data goes to tape. You don’t want to get rid of the old data, of course, because you need it for analysis verification and when people come up with new analysis ideas, but the older data naturally gets accessed less and less as time goes on, making tape storage a natural choice.


I've been told that NOAA uses tape to stream time series weather data into long-running simulations.


I use LTO to archive my home NAS periodically (see my comment above). I admit the decision to purchase the (rather expensive) drive was influenced by curiosity, not cost or convenience. But now I can keep hard copies of my entire digital life off-site at no additional cost for decades at a time.


You backup to a disk array and offload the stuff you want to keep to tape for offline storage.


All data is stored physically. All backups are physical.


Is any of this ever going to be accessible to a consumer market? Or is it always going to cost gobs of money?


No. The consumer and SOHO tape market died around 2000 as companies sold off their business units and/or went bankrupt. Trickle down in the LTO range is buying a refurbished drive that is several generations behind and you better have SCSI or SAS in your computer.


Kids these days... Spectrum protect/Tivoli storage manager/adsm is where the backup is. Since 1989.


Remember 3D holographic data cubes? I was promised, in the mid-1990s, that we'd have them by 2000 and that they would have "up to" 100 times the storage capacity of a DVD.


I worked on holographic data storage in LiNbO3 crystals in the early 2000 years. We had a lab prototype which would store over 1 GB in roughly a sugar cube size of crystal. The technology is certainly possible. However, the amount of optics required made it a huge and expensive system and there was not a lot of funding. Using modern diode lasers, the setup could be minimized a lot. However, the big limit is the wavelength of light. Green light has roughly 500nm of wavelength, and that determines the smalles features. This value back then was small but is huge compared to modern chip structures. We were losing the density battle even back then. There is one trait still not matched though: stability. Those crystals are about the toughest things imaginable. Unless you smash them with a hammer, they won't rot or decay. Storage times of hundreds if not thousands of years would be possible.


Long-term storability seems very desirable for certain classes of data. But a problem might be to conserve the crystal reader system which I assume needs to be well-calibrated and sort of fragile. You might have to have some sort of bootstrapping solution like very basic materials and instructions to build a simple 3D printer which then builds a more complicated assembly system which is then able to build the crystal reader.


Actually, the way we wrote the data, you don't need a complex reader system. Illuminating the spot where the data was stored with a laser beam would project a 2d image, like a large QR-code, which any camera could pick up. So the reading equipment would rather easy to replicate.


Yeah, so we currently have micro-SD cards with a 1TB capacity, which is 100 times the storage capacity of a DVD. I see your holographic data cube and raise you a fingernail.


>I see your holographic data cube and raise you a fingernail.

Reminds me of this comment:

https://news.ycombinator.com/item?id=17647545


I guess it's still glass with incredibly tiny details etched into it, just not quite as fantastic looking.


That micro-SD will lose its data in 20 years.


So you're saying this tape claim will end up being some new form of CD?


That's not at all what he is saying.


Every few years I remember the story from 2000-2001 claiming that a German research lab successfully stored 10GB WORM on a roll of clear sticky tape ("Scotch" tape).

I couldn't really tell if it was serious or not at the time, but it looks like nothing has come of it.

[1] https://uat.designnews.com/automation-motion-control/tale-ta...

[2] https://tech.slashdot.org/story/00/03/18/1218250/scotch-tape...


Oh the good times... I remember when before christmas Steffen and Matthias goofed around in the university basement with their equipment... Actually they pivoted from data storage to security and anti tamper stickers, and are still in business [0, 1].

Steffen Noethe is into life sciences since 2013 [2].

P.S.: It did work for up to five or so layers, then focusing and laser power were hard to handle. They never developed a robust drive from that principle and the spot in between flash, DVDs and tape was closing too fast to create enough business...

[0] https://www.tesa-scribos.com/de [1] https://www.tesa.com/en/about-tesa/press-insights/stories/th... [2] https://www.linkedin.com/in/steffen-noehte-b318a728


Thanks for that! I love the power of HN to reach people with personal knowledge about technology.

For anybody else interested, I believe we're talking about this patent: https://patents.google.com/patent/US6386458B1


Just contributing my iota where I can... =:-)

You are right, that is one of them. There's some more even...

[0] https://patents.justia.com/inventor/matthias-gerspach


Paper storage was another weird one literally use a sheet of paper to store information. Dots or symbols of some sort and hundreds of gigabytes stored on a sheet of paper.



I couldn't recall the application name. Thanks!

I tried it and it did print on paper but it seems the application isn't great on connecting to a modern scanner of a multi-function printer MFP. It's a networked MFP but even with the MFP connected via USB it balks and says no scanner found.

https://i.imgur.com/GJu2GPV.jpg


I was thinking about how it probably hasn't aged well and went off looking for forks or other projects like it. I found this and am considering building it play around: https://git.teknik.io/scuti/paperback-cli


Mainframe programs used to be loaded from paper tape into memory and then run, so I've read. Maybe same for minicomputers.


technically, every flash memory today is a cube (well, more of a skewed cube, a Parallelepiped)

Mostly because they can't figure out how to make things faster, or as fast as the interface which keeps getting faster, so they just pile a bunch of flash controllers on top of each other. Everyone is running stripped RAID-0 and don't even know it.

And if you consider that it also uses varying analog voltage values on each node (!SLC), it is arguably a 4d cube. Take that, 90s!


I kinda miss this "storage media enthusiasm" of the late 1990s and early 2000s. Sure I like my usbkeys and cloud storage, but its just not the same as "data cubes", scotch tape or even just actual products in these weird "in-between" times back then like jaz&clik drives, minidiscs, etc. (microSD cards are pretty cool though)


My entry into investing was buying stock in constellation 3D a fluorescent multi layer disc/card. At the time its capacity was up there (~1TB DVD in the late 1990s). It was the clear discs and florescent layers that was interesting. The company folded and took my $3,000 with it.


Also microfiber-based data storage media (first paper on this was in 1998) which require new lasers that we still don't have yet.


I guess I'm just going to have to see it in a store to believe it.


What would be the best way to handle a tape drive like this?

Imagine investing in a 1PB tape, and then accidentally dropping it on the floor. Even just setting it down on a table too hard might brick it, or mean it needs to go back to the manufacturer for a expen$$$$$ive repair.

Gravity seems like an unreasonably large risk for such a valuable object. Maybe the ideal environment for these is a zero-gravity one? Perhaps the future of "big data" is somewhere out in space?


If you google this technical white paper from HP (a major LTO vendor) you'll see that the format (in all iterations) is designed to be pretty durable:

4AA4_3781EEE.pdf

In particular, they test dropping tapes hundreds of times from 30" AFF onto concrete in different orientations, making sure they'll still work properly. Not every vendor does this, but they all like to be perceived as archival quality and durable.

I wouldn't be too worried.


Google: "tape library robot"


Are tapes particularly fragile? At least any more fragile than most other media?

I remember being pretty rough with audio and video cassettes, but then again those aren't digital so some damage probably wasn't noticeable.


Or forgetting to remove from pocket during an MRI scan


these tapes are going to be fairly cheap (<$400) by 2035.

But more importantly, if its an LTO-style tape, it'll be fairly solid.

even without the case, you can drop them and not worry too much. Lord knows I've dropped loads in my time.


These are not meant to be handled by human hands.


Are you called Linus Sebastian?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: