Hacker News new | past | comments | ask | show | jobs | submit login
The future of data storage is still magnetic tape (2018) (ieee.org)
97 points by Bluestein on June 13, 2021 | hide | past | favorite | 105 comments



I wish tape drives weren't so expensive. I'd love to have on each personal computer a backup system that just mirrors every disk write to tape.

According to the SMART data for my current system I write about 9 TB per year. That would fit on one LTO 8 cartridge.


When I last calculated the break-even point for a thrifty home user, it was around 50TB of data. Above 50TB and there was a decent chance that tape is cheaper than disk, below 50TB it's unlikely.

I'll just say that the exact tradeoff will depend on a lot of factors, it's just as a rule of thumb, you can forget about tape below about 50TB.

The media itself is cheap and durable, but the drives are $1,000 or more and sometimes break--in other words, if you really must have your data, you want two drives.


> The media itself is cheap and durable, but the drives are $1,000 or more and sometimes break--in other words, if you really must have your data, you want two drives.

Newer drives will also drop support for older LTO standards, so you end up having to rely on flimsy old hardware to recover your older tapes. For most home users, it just doesn't seem worth the hassle. Maybe it starts making sense for casual/non-professional users in the 100TB+ range, but even the most committed data hoarders are nowhere near that.


> but even the most committed data hoarders are nowhere near that

Not all that committed, well over that.


"even the most committed data hoarders are nowhere near that."

Eh, i do meet people with ~50TB archives


> Eh, i do meet people with ~50TB archives

Film? Curious ...


Russian librarians say hi


Depends on the needs. Tape will keep longer unpowered. A traditional HDD is a bit riskier if left offline and unpowered for a while. Even if they're all powered in a tower, I'd still recommend something like Backblaze for redundancy.


If you're storing 50TB of data in B2, that's $3k/year. For that price, you can make 2x copies of your data on tape, and then buy two tape drives to read them.


B2 is an upgrade option. You can still store an unlimited amount on their standard plan, or upgrade for a little bit more for access to a year's worth of changes. It would be worth it to keep a separate desktop with a dozen drives connected and pay for a second annual subscription if you have that much data.

The one-year retention plan adds $24/year. Even the "forever" retention plan for 50TB would only add $250/year.

If we're talking about home use, B2 really isn't necessary. And if you want a daily rotation every day for a year the tape costs can add up, depending on how much data Cha he's day to day. For home use, an HDD backup combined with something like Backblaze seems like a pretty reasonable cost for two layers of redundancy and a year's worth of file versions changes.


Kind of a hypothetical scenario for me—backing up 50TB to cloud would saturate my uplink for a full year and more. By comparison, it’s about four days of drive time on LTO-5. So while LTO-5 is a plausible way for me to store 50TB of data, cloud storage is decidedly a fantasy.

Personally—gonna say I don’t trust arrangements where I know the other side is losing money on the deal. This includes “unlimited” storage. I put “unlimited” in quotes because all of these other services with unlimited storage have, at some point, gotten shut down their unlimited storage plans. That’s why I compare to B2… not really interested in gambling on those things. So I would take B2 as the baseline for comparison when you're talking about this much data.

Amazon once had an unlimited storage option, if you recall. One data hoarder was recording cam girl streams and had just north of 1 PB in his Amazon Drive folder. Whole thing got shut down shortly after.


Bandwidth limitation is a great example of how specific circumstances should dictate backup protocol. Tape absolutely makes sense there even in a home environment. In your case, if it was really critical, I'd probably want to get two tapes at certain intervals so I could ship a copy off site and have the off-site copy online in a dedicated server at a data center so I could retrieve critical files if needed. Probably too extreme for a home setup unless it's a home business, but I've also had to restore from an overnight tape when a database went corrupt (multiple times) so if I was still involved in that sort of sys admin work I'd be a bit paranoid.

As for financial issues with "unlimited", Backblaze has a roughly 15 year track record and has been profitable for almost as long. They're not a typical startup looking to pump up a customer base by selling below costs and then dump to an exit acquisition or IPO. Storage is cheap, and they have managed to build a business that doesn't screw up the economics of low marginal costs by becoming overly bloated.

That said, they're still one of of two layers of redundancy I use for my home backups. I've seen cloud services lose files, so I'm banking on the fact that my own storage won't go down at the same time as two cloud services.


B2 Fireball can import 96TB:

https://www.backblaze.com/b2/solutions/datatransfer/fireball...

You can also export B2 data via a USB hard drive, and if you return the drive within 30 days, it's free:

https://help.backblaze.com/hc/en-us/articles/360001925414?_g...


> recording cam girl streams ...

I mean ... somebody has to archive -that- internet content.

For ... science.


You're comparing apples and oranges. Even though both b2 and a second tape copy are both "a copy", I'd wager than something stored in the cloud has much lower failure chance than a second tape copy.


> You're comparing apples and oranges. Even though both b2 and a second tape copy are both "a copy", I'd wager than something stored in the cloud has much lower failure chance than a second tape copy.

Depends on what kind of failures you're thinking of. A tape won't erase itself if your subscription lapses or if the service you chose finds itself on "Our Incredible Journey." As a home user, both those scenarios are pretty high on my list, since I don't some team to make sure my data is always migrated and accessible. There are a lot of scenarios where some bit of data may need to sit passive unattended for years/decades, and/or be discoverable and usable by someone else (e.g. family sorting through a dead person's things).


I wasn’t the one who brought up Backblaze.

The idea behind having two copies on tape is that it’s an easy way to get redundancy without special software. This makes it comparable to Backblaze, which presumably uses erasure codes, which allows them to achieve better redundancy with lower overhead (Backblaze’s storage can survive a higher rate of failure, and the encoding requires less overhead than a second whole copy).

What would be unfair is comparing Backblaze against a single tape copy.

However, two copies on tape is a pretty good backup. Paranoid, even.


Having had to restore from tape in very stressful circumstances, if I still did sys admin on that sort of thing I would definitely be paranoid enough to want a second tape off-site. If not daily, then at least on some sort of regular rotation. Though it would depend on the criticality, and for anything really critical I'd want a colo storage server as well. If the data is critical then even for a relatively small business $15k in upfront costs for the tape drives and lots of tapes and annual costs at the colo center wouldn't be all that much.


in other words, if you really must have your data, you want two drives.

On the bright side, since it's removable media, anyone with a working drive can read your tapes for you.


They used to be a lot cheaper. When I did admin work for a small company ~2004 they weren't as cheap as a DVD R/W, but they were still pretty inexpensive, and I kept rotating 7-day backups on them along with a monthly rotation, and IIRC the system itself was RAID 3. When I upgraded the server I think the DAT tape drive, 4GB per tape, only added a a few hundred dollars.

These days a new tape drive goes into the thousands. It's a shame because it's still very difficult to beat tape for high capacity unpowered offline storage. Some optical formats have longer shelf lives but lower capacities, and also aren't cheap.

I'm glad I don't have to worry about backup/restore in my job anymore. I had to make use of those daily tapes more than once and the stress of that was not fun.


It's not just the price either. LTO has backwards compatibility for the previous two generations only. There's a risk data stored on LTO cartridges will not be readable by future tape drives.


Used tape drives with not many hours on them are easy to find on eBay and other places for a fraction of the cost of new drives, and LTO-5 and 6 tapes, for instance, are pretty inexpensive.


How would you make use of data that was "every write", which could just be blocks of subfiles?


You can restore the state of the drive to an arbitrary time in the past by replaying past writes.

For example, suppose something bad happens (a bug deletes important files, someone installs ransomware, etc) and you want to roll back the system three days.

Restore the last full backup you have that is older than the desired rollback time. Then replay from the write log every write that occurred between the time of that backup and the time you are trying to rollback to.

Another use for it is restoring or reverting files that were originally created more recently than your last regular (full or incremental) backup.

Example: I do incremental cloud backups once a week, on Saturday. Suppose I create a file on Monday, make various changes to it on Tuesday and Wednesday, accidentally delete it or corrupt it on Thursday, and realize this on Friday and want to recover the file.

With the "every write" log, I could find the file in the write log and recover it.

You might at first think that I would not be able to find it because unlike most backup programs the write log does not explicitly record file information. But it doesn't have to explicitly record file information because when you create a file or modify it the OS stores that information by writing to the disk, and those writes end up in your tape write log.

If I'm trying to restore /home/tzs/important_file to the state it was two days ago, the restore process would scan the write log starting far enough back to find the latest earlier write of the root inode (let's assume it was a classic Unix filesystem). Once it has that it can find /home's inode, then can find the correct contents of that on the target date, and so on until it has the inode of important_file. Then it can find what blocks comprised the file, and pull the right versions of those blocks from the write log.

Note that for this to work, the restore program must intimately understand the filesystem whose write log it is working with. This is one of the reasons you want to still do a more normal full/incremental period backup system with a backup program that explicitly stores files. The write log is best used to supplement the traditional approach.

PS: there was a backup program in whatever AT&T Unix was current in the early '80s (System III) that used the write log approach, although it was not continuous and it wrote redundant data. It was used for incremental backups. You started with a full block level backup. Then when you wanted to do an incremental backup it did this:

1. Make an empty list of blocks.

2. Add the block number of the superblock to the list.

3. Scan all the inodes looking for any inode that has a create or modify time later than the previous backup. For any such inode:

3a. Add the number of the block that contains that inode to the list.

3b. Add the block numbers of the file that the inode refers to to the list, including any indirect blocks.

4. Sort the block number list.

5. Copy all the blocks that are referenced in the list to the tape.

(There was probably a "write some metadata about the backup to the tape" step in there somewhere, too).

Note that this approach was redundant, in that if you changed part of a file the whole file would be written to the incremental backup. A pure write log would only write the changed blocks, but most systems don't record sufficiently granular modification information to allow that.

On the plus side, it meant that restoring a file from an incremental backup was easy. The file was either in that incremental entirely or not at all.


What's the ball park cost on a drive+tape (say for a desktop)?


Kind of like asking the cost of a car + groceries.

You can pick up an old tape drive, like an LTO5, for about $1,000. You can LTO-5 tapes for about $15 each, with a capacity of 1.5 TB.

This setup puts the cost at around $10/TB, compared to somewhere around $30/TB for disk. Since you save $20/TB by using tape, it takes you 50TB before the total setup is less expensive than disk. Plug in your own numbers if you like.

You'll want to get a fiber channel card too, but that's easy enough.

Newer generations of LTO have marginally cheaper media (per TB) when you're buying in small quantities, but the drives are more expensive, and it's not uncommon to see poor capacity utilization in tape systems. The big players with lots of data will get newer LTO generations because they can save on library space, vault space, and drive time.


The advantage of tape is that you can easily and cheaply keep a daily rotation as well, slot in a monthly, and then an annual to really keep a good cold storage backup. I had to restore from a previous night's tape often enough that for anything critical I'd be paranoid about adequate backup protocols. I don't know the failure rate for tapes, but I think they're lower than mechanical HDD, and definitely last longer unpowered.


If you're in LTO5 land, you can get away with U320 SCSI instead of FC, and this can cut your costs in half. (Chances are if you're asking this question, you're the kind of user who won't be impacted much by the slower R/W speeds).

Also, library units are often cheaper than standalone drives for reasons I don't quite understand. I purchased a 12 tape autostacker for around $250 shipped, paid another company about $50 for the upgrade to make it 24 tapes, and another $50 for the HBA.


It's worth pointing out that these media prices are very US-centric. In the UK, the prices for the media are exorbitant, even for used tapes.


International shipping for tapes can't be that expensive? Although maybe enterprise users don't want grey market tapes.


As a hobbyist with no commercial reason that could justify faffing about with imported tapes, I have a deep hatred for import taxes and the (occasionally very) steep shipping prices across the Atlantic. If I'm gonna have to pony up 200% of the original price to get the tapes delivered to my doorstep, tape drives might as well be fictional to me.


Even in UK, you can buy them from Amazon Germany or from Amazon France, at only EUR 7 per TB for LTO-8 or around EUR 8 per TB for the older standards.

Previously there were no customs taxes but after Brexit I assume that there might be some taxes, but they should not be so high as to increase much this low price.


I'm seeing LTO-5 drives on Ebay for far less than $1000 (more like $150 to $300). Am I missing something obvious?


Those are very old.

The current standard is LTO-8, with 12 real TB of data per cartridge and it has the lowest cost per TB stored.

However, the LTO-8 drives are very expensive, e.g. between $2000 and $4000.


Or, as anyone who's spent the last year patiently waiting in line at a Microcenter has observed, the price of a gaming enthusiast's graphics card.


Used drives. I would be worried about how clean the heads are and whether they are worn down and need to be replaced.


However, if you buy external USB hard drives, it's only about $20/TB, you have full random access, you don't need a fiber channel card, and if you buy a bunch of USB hubs, you don't have to change tapes every 1.5 TB.

IMO the niche where tape still has an advantage is pretty small these days.


The idea of making a rats nest out of 50TB of USB hard drives aimed at consumers is not appealing. They're not as durable as I'd like, either.


Well, my experience with backup tapes is not one of durability either but maybe that's changed in the intervening 30 years or so.


I can't speak to the old generations of tape, prior to 2000. The LTO cartridges seem fairly sturdy to me, and I haven't run into many problems with the media itself. Unlike hard drives, tape media is fairly insensitive to shock, which makes it easy to transport. I've seen cases of water-damaged media, cases where the leader pin broke off (and got taped back on, and the tape was successfully read), cases where tapes were physically lost, cases where the database recording the locations of tapes was corrupted...

...but data durability has been overall excellent, and compares favorably to disk.


If you drop back to used LTO4 you can get an entire setup for less than $500.


Which is great, until the heads wear out. I would hesitate to buy a used drive unless the heads have been replaced, especially since LTO-4 is about 14 years old at this point.


Good point - and to be really secure you should have two drives - write on one and read back and verify on the other.


I work with a lot of tape in the oil and gas sector and one of the major issues we see is that industry wide file formats for data exchange are still based on old tapes. These formats have fixed block sizes that haven't changed since the 1950's. Modern tapes have exponentially larger capacities and bandwidth but due to the use of these file formats we rarely see 20% of the theoretical throughput on a brand new tape drive and cartridge. The industry has no desire to change the file format so we are stuck in this situation where tape read/write times grow exponentially with every new tape generation.


Can you give an example? What could change if we stopped optimizing for legacy stuff?


The main issue is that the standard was written back when file formats and tape formats were essentially the same thing. Data was read from tape, processed and written to a second tape. The memory of machine at the time were of the order of 1 block.

Decoupling tape formats from file format effectively resolves the issue, that is: read the tape in 6k blocks, write the file to a modern file system. Set the tape block size to 10mb. Write the file back to tape as a tar. You get the bandwidth limit of the tape drive.

The issue is almost all legacy industry software is still designed to read and write directly to tape objects so your tape isn't readable in all the software this data is designed for.


Couldn’t this be addressed with virtual tape devices? The application gets presented with something that is indistinguishable in behaviour from a physical tape drive, but actually backed by a file on SSD or hard disk. Then you can copy that file to a real tape device separately


I can't say for oil and gas but I encountered the same thing in semiconductor manufacturing. CMP machines still use tape formats even if the file is now written to different media. The challenge during transition was in proving reliability and transition costs. Sometimes letting engineers deal with the hassle is cheaper than letting them fix it.


I agree. This would work, but you still need to get the data off the tape, and onto the SSD. The only way to do this is 60MB/sec.


It helps that those tapes can be read in the future easily as well. HP at one point asked the company I worked for at the time if they could use their tape copy program with their new drives so that customers could copy older DAT tapes to a newer version so they didn't have to support older media formats in their newer drives.

The newer media is typically on a new very fast tape drive and sometimes it is difficult to saturate those anyway and keep stopping and starting and having to reposition over and over.


That's interesting. So the overhead to assemble the file is just so much that you can't stream output to the tape fast enough? That seems like something that could be overcome.

I get that a better format would make it easier, but it seems like you should be able to, with some optimization or disk cache, stream the output fast enough to outpace the tape drive.


I am talking about bottlenecks on reading from the tape. Sending the data to /dev/null eliminates any write bottlenecks and we still max out at 60mb/sec on a 250mb/sec drive.


Hrm. That is odd, as I assume these old formats would be something that compresses well, which gets higher read/write speeds.

Edit: Hrm. I believe you can disable hardware compression on some LTO drives. Maybe that would help?


The data is binary formatted floating point data, generally using the IBM floating point format, though IEEE floats are also allowed. Standard compression like gzip basically does nothing. All tape manufacturers claim their hardware based compression gives 2/2.5/3 times the tape capacity and it is all baloney. A 1TB tape holds 1TB of data.

There have been multiple attempts over the years to implement all sorts of algorithms but they have all failed to gain traction. For example Wavelet based compression works fantastically well for some data (i.e. real signals with band limited frequencies), and catastrophically badly for others (earth models defined with piecewise functions containing non-differentiable points). Some people are happy with lossy compression, others are not.

The only compression I have seen worth anything is in deep water, where there is literally no data in the water column (i.e. 500 consecutive zeroes in a 6000 byte block), gzip gives you the that section free.


Might it be possible to embed the legacy format witbin a more modern format that gets written to the tape?


The article briefly mentions HAMR (Heat Assisted Magnetic Recording) hard drives. Seagate has been shipping them since 2020 to some customers. Seagate's roadmap is 120TB drives in 2030.

https://www.anandtech.com/show/16544/seagates-roadmap-120-tb...

If you are a business then tape is great. But as a consumer with around 50TB of data the problem is that current generation LTO tape drives are in the over $3,000 range. The tapes are reasonably priced but the drive cost is so high that you can buy about 200TB of hard drives for the cost of just the tape drive.


Discussed at the time (of the article, not the future):

Why the Future of Data Storage Is Still Magnetic Tape - https://news.ycombinator.com/item?id=17864392 - Aug 2018 (155 comments)


People are still working to use lasers to write data much faster. TU/e paper from a year ago:

[https://phys.org/news/2020-07-ultra-fast-laser-based-storage...] "Ultra-fast laser-based writing of data to storage devices"

"researchers ...have demonstrated a new approach that can achieve deterministic single pulse writing in magnetic storage materials, making the writing process much more accurate." (on a 3-layer, 15nm surface)

Paper: [https://www.nature.com/articles/s41467-020-17676-6] "Deterministic all-optical magnetization writing facilitated by non-local transfer of spin angular momentum"

Still writing to 2D. I guess laser-write/read to 3D cubes is still a pipe dream.



OK, cool. My browser's blind to MS, but I found an alternative: https://techxplore.com/news/2020-09-microsoft-holographic-so...


A competing but emerging technology to keep an eye on is holographic storage.

"IBM has already demonstrated the possibility of holding 1 TB of data in a crystal the size of a sugar cube and of data access rates of one trillion bits per second"[1]

[1] https://en.wikipedia.org/wiki/Holographic_Data_Storage_Syste...



But not rewritable.


Magnetic tape is typically used for backup or archival purposes most of the time though, so that might not be such a big issue and they'll be used for a similar purpose


That is a feature, not a bug, for long-term archival media.


The other advantage of tape is that ransomware shouldn't be able to encrypt your backups (if written in append mode).


This is what gets me about read-write online backup media. How do you ensure it's not compromised by ransomware? How do you ensure that your snapshots aren't compromised? It boggles the mind to to think that defense in depth and software write protection is considered fine these days.


I regularly bring up the subject of physical write-enable switches (that hard drives used to have). Inevitably, someone responds that it's a great idea, and their software has a write-enable setting.

The other mind-boggling thing is I argue that remote updating shouldn't be allowed unless there's a physical switch to enable it. The response is that remote updating is necessary to keep systems secure from malicious remote updates.

I have a paper sleeve that goes over my webcam when not in use.


I would probably pay extra for a switch although a big heavy mechanical switch in the middle of the signal path for a modern SSD would probably not be as cheap as you might expect.


It can even be jumper pins the user who cares can attach their own switch to. I'd pay extra for a slide switch on a USB stick. Heck, USB sticks often come with a slider for a retractable cover, obviously that cost is insignificant.

Hard drives used to come with those jumper pins. They are not costly.


The point I was making is that the signal integrity work would be a lot harder than for a hard drive of yesteryear. The BOM cost of the switch isn't that important


Floppy days! Way back then I remember there are bootable linux or freebsd server running off a DVD disk. The idea is that the server data then is immutable.


Yes, you said read write, but to answer your question at an industrial level, set your AWS IAM permissions to write only for your normal backup role. No delete or modify.


Solving this issue at higher levels of abstraction is precisely what I am against. This should be a feature of the media itself.


I feel like there's a place for home users to rent a tape drive, write everything to tape and return the drive.

Uber for tape-based storage. You don't need to own the drive. Just the tapes. Maybe not even the tapes. Dropbox using tape storage? Maybe. Personally I'd like to own the actual medium.

Ycombinator! Make it happen.


Just use a cloud provider. If you are worried about privacy encrypt it.

For example, AWS Glacier deep archive would be years of storage for the cost of shipping a tape drive. If you have bandwidth issues use snowcones.


Proxmox Backup Server has LTO tape support now:

https://pbs.proxmox.com/docs/tape-backup.html

IIRC they rewrote the tape driver in userspace in rust.


Huh. I glossed over the article but found no explanation for why that was necessary. Do you happen to know by any chance?


This is from 2018, I'd be curious to hear if anything has changed in the past few years.


Anecdatum:

In 2018 I was running a consulting business helping customers with backup to tape, but even back then most customers were doing backup to a dedupe (magnetic disk) library like the HP StoreOnce, but then often copying that spooled-to-disk data to tape one per week or once per month.

Now (2021) the flavour-of-the-month is backing up to cloud storage (s3 or Azure blobstore), sometimes directly, sometimes as a copy from deduplication storage. This morning's email summary tells me that one of my customers has just recently stopped using their on-site tape library; I have a quote out for some work with another customer to decommission theirs and replace it with writing to a cloud-hosted provider.

So tape is still getting used (assuming that's what Deep Glacier actually is), but it isn't owned by the customer. But if you are backing up to Google's blob storage, then no, it isn't tape, it's just magnetic storage again, just in low-speed access sections of disk.

The product I was working (DataProtector) has gone through a bit of churn as it changed hands from HP to Microfocus, so this could cloud the numbers a bit.


Magnetic tape have simply a far bigger working area than HDDs. An HDD have only it's disk area, multiplied by the number of disks. Magnetic tape is wrapped which make the total surface way way higher.


Not "simply" when the densities are so different.

They both happen to be similar in price, but if the density gap grew by 3x then suddenly tape would be more expensive and only used for certain types of archival.


The areal density of tape is far lower, but the volume density is higher, simply because HDDs plays only on 2D planes while tapes are wrapped. If the density gap grew, magnetics tapes will try to put more tape in a single enclosure, like HDDs try to put more disk in a single enclosure.


> the volume density is higher

Not really. If you compare the current best tapes and hard drives, the hard drives store exactly the same amount per cubic centimeter. If you look at just the 10cm by 10cm part with the platters, and compare it to the 10cm by 10cm tape cartridge, the hard drive is fitting more data into less space. Once LTO-9 is available then it will win the first comparison but it will be just about tied for the second comparison.

> If the density gap grew, magnetics tapes will try to put more tape in a single enclosure, like HDDs try to put more disk in a single enclosure.

You can only fit so much more tape in the same box. They've already been increasing the tape length as much as is reasonable, but from LTO-3 to LTO-9 it's less than a 50% increase. Much like hard drives have managed to fit in a couple extra platters.


What really counts is the volume density for the entire device, i.e. for a LTO tape cartridge vs. a 3.5" HDD.

Comparing a 12 TB tape with an 18 TB 3.5" HDD, the volume of the cartridge is about a half or less, so the tape still has better density.

Moreover the tape cartridge is many times lighter, if you carry in your pocket a tape cartridge vs a 3.5" HDD, the difference is very significant. You would not notice the tape cartridge, while the HDD would be like having a stone in your pocket.

Experimental tapes have already been demonstrated at volumic densities much better than what HDDs hope to achieve in the future.


> Comparing a 12 TB tape with an 18 TB 3.5" HDD, the volume of the cartridge is about a half or less, so the tape still has better density.

The volume of a tape is very close to 60% of a hard drive. So 12TB tape and 20TB drive are very close. If we use 18TB then it's still close.

But I wasn't even making an argument about the current tech. I was talking about if things were nudged just a bit. If we go back to imagining the world where the huge areal density gap is made slightly larger, then hard drives would crush tapes in both price and density. Despite tapes having a far bigger working area.

So the analysis is not "simply" that tapes have a far bigger working area. If it was, we'd have hundreds of terabytes in a tape right now.

> Moreover the tape cartridge is many times lighter, if you carry in your pocket a tape cartridge vs a 3.5" HDD, the difference is very significant. You would not notice the tape cartridge, while the HDD would be like having a stone in your pocket.

That's such a niche use it would barely affect sales. Effectively nobody carries around pocket tapes, because if you can afford a drive then you have stacks and stacks of tapes.

> Experimental tapes have already been demonstrated at volumic densities much better than what HDDs hope to achieve in the future.

That's cool but I'll wait to see when/if they're practical to make into a product, and whether something else has come along to make both of them blush.


Where I work we moved to a fancy backup solution a few years ago. Which one doesn't matter but suffice to say the name implies the focus isn't on metal

Not a single one of our metal machines ever successfully backed up with it. Tens of TBs per system seems to be where it just fails every single time. We put those systems back on the official-retiring tape system and now all it stuff is backed up without a hitch every time.


Kind of semi related: What do you buy if you want to do tape backups at home?


Probably a LTO drive a few generations old and a SAS card. Current generations cost too much and you aren't going to be able to keep up with the data rate that they prefer.


From personal experience you are not going to be able to saturate even a horrifically obsolete LTO drive and SAS interface without specialized software or purely sequential data. Most of the cheap/free ways of running tape drives aren't optimized for parallel I/O and will horrifically shoe-shine (that's the rev-up and rev-down sound you hear when the tape drive isn't getting enough data), which isn't great for the tapes and massively increases backup time and drive usage.

AFAIK most commercial tape deployments nowadays are disk-to-disk-to-tape arrangements. All the actual data is serialized to an archive on disk first, and then that serialized archive is written to tape at full speed. This minimizes tape wear and ensures your very expensive tape drives are being used efficiently.


I am using a LTO-7 drive connected to a FreeBSD server.

FreeBSD always succeeds to write the tape continuously at about 300 MB per second, which is the maximum speed for LTO-7.

All the files send to the tape are grouped into large archive files and for the dd command that writes to tape I use a block size of 128 kB.

The tape commands from FreeBSD are more convenient than those from Linux, which have not seen much maintenance in recent years.

Obviously, you cannot reach tape speed when making the backup directly from a HDD or from a 1 Gb/s Ethernet.

You must write the backup to tape either from a fast SSD, or from 10 Gb/s Ethernet coming from a fast SSD at the other end, or from a RAM disk configured on the server, if you have enough memory.

To not wear unnecessarily the SSDs on my server where the tape drive is located, whenever I write the backup, I configure a large RAM disk on the server. The backup files coming through Ethernet to the server are written to the RAM disk on the server, then they are copied to the tape.

With this arrangement it is very easy to ensure that the tape drive is written at maximum speed without any hiccups.


Hi,

how that LTO-7 drive is attached to the host? Is it USB 3.0 external drive or internal drive with additional controller?

Thanks.


It's probably fibre channel if external, or SAS if internal.


I would like very much if someone would introduce a tape drive with an USB interface, using the USB-Attached-SCSI-Protocol.

This would have no importance for software, as the tape drives would continue to be seen as SCSI devices. The performance of USB is adequate and an USB tape drive could be cheaper, while also saving money for not needing a SCSI HBA.

Unfortunately, there are no tape drives for modern tape standards with USB and no hope that one will appear soon.

I use a tabletop tape drive. Internal tape drives are a little cheaper, but I think that the external drive is more reliable, because it is protected from dust when not active and it is well cooled (even if it is noisy) when it is active.

Previously I have used a SCSI HBA card for PCIe, together with an external SCSI cable having the appropriate connectors.

The server motherboard that I am using now has on-board SCSI, so I have a cable from motherboard to the case, having internal SCSI connectors at one end, to plug in the motherboard, and external connectors at the other end, in a cover for a the opening of an expansion card slot on the case. Then I have an external SCSI cable to connect the tabletop drive.

With SCSI cables, you must pay attention to the connectors at each end, because there are many variants. On the motherboard I have internal high-density connectors, on the case I have external high-density connectors, while on the tape drive I have external lower-density connectors.

I have used 2 kinds of SCSI HBA cards, some with LSI controllers, e.g. LSI LSI00343 SAS 9300-8E Host Bus Adapter, and a similar card with a Microsemi controller (now Microchip). There were no differences, all were OK.

The SCSI HBA cards may have either external connectors or internal connectors or both kinds, so you must choose the appropriate card, depending on whether your drive is internal or external.

The only difficulty that I had in the past was that on some motherboards the computer did not boot with the SCSI card inside and I could not understand why.

I have even bought a second SCSI HBA card from a different vendor, wrongly believing that it is buggy.

The problem turned out to be not a bug, but a standard feature :-(

When booting in legacy mode, some add-in cards, including all SCSI cards and all GPUs, attempt to map their ROMs adding BIOS functions in the address space above 640 kB but under 1 MB, which is accessed in the Intel real-address mode.

The SCSI cards do this for the case when you will attach a HDD or a SSD and you would want to boot from it, which the MB BIOS does not know how to do.

On the computers that refused to boot, without any errror message, I also had a NVIDIA GPU. The sizes of the SCSI ROM with the NVIDIA ROM together exceeded the reserved address area mappable by the BIOS, so the BIOS failed to initialize the GPU. When either one of the 2 cards was present booting was OK, with both cards the computer remained stuck with a blank screen.

The solution for those cards to work together was to go in the BIOS at PCIe properties, select the slot where I intended to put the SCSI card and disable the mapping of the extension ROM.

However, not all BIOSes have these options. On one of my Supermicro MBs, the BIOS had the option, but it did not have any effect, due to a BIOS bug. While I have been very content with the hardware of my Supermicro motherboards, I have encountered a lot of bugs in their BIOSes.

Because I use the HBA only for tapes, disabling the ROM has no consequence, because I never want to boot from SCSI.



It may even be possible to find on eBay a multi tape loader that has a disk inside and appears to the system as just a dumb tape drive.


Very insightful comment on the data rate, if you can't feed the LTO properly, it may be stopping and starting all the time, which could have a detrimental effect to its mechanism. But this is probably easily fixed by spooling (using bacula terminology). You buffer like 100Gbytes, write them in one batch. Plus, some drives support variable speeds.

Also, in addition to SAS, a fibre channel card would also fit the bill, albeit probably a trifle more expensive. But if you go for low speeds (e.g. 8Gbps), those cards can be less than 100$.


Yes, that is how it should be done.

For example, I collect all the files that I send to backup in archive files whose size is approximately 50 GB, then I copy to the tape the 50 GB files.

Doing like this, the write speed to a LTO-7 drive is always constant at 300 MB per second, which is the maximum possible for that standard (which is much higher than what a HDD can sustain, so the 50 GB files must come from a fast SSD or from a RAM disk).


I didn't know -- or perhaps remember -- that you had to match the feed against the speed, sort of like the way you had to be very careful about burning CD-Rs in the old days, but that makes perfect sense.

I have been fearfully realizing that I will soon need tape backup for my next project and this is helpful. Now I am wondering if a RAID 0 of multiple HDDs could provide the ~300 MB/s speed needed.


You do not have to match the speed, but you should.

Sending the data fast enough to the drive will reduce both the total time required for backup, when the drive is active, and it will also avoid starting and stopping a lot the drive during the transfer.

Both eliminating the start-stop cycles and reducing the total active time will increase the life of your tape drive.

On my server I have 128 GB of DRAM, which makes it extremely easy to ensure maximum speed without wearing a SSD.

When the backup starts, I create an 80-GB RAM disk, larger than the up to 60 GB archives in which I collect the files sent to backup.

With less memory one could either use smaller chunks or use a SSD instead of the RAM disk, but that seems wasteful.

Then the archive files are buffered in the RAM disk and written in one command, without pauses.


> Current generations cost too much and you aren't going to be able to keep up with the data rate that they prefer.

Maybe. Keeping up only really matters if your slowest data rate is going to be between 50MB/s and 120MB/s. If it's below that then you have a strong need for a buffer drive no matter what tape generation. And if it's above that then you shouldn't have any real trouble no matter what tape generation.


Three questions as one who is mostly ignorant of magnetic tape:

1. How expensive is this medium per unit of memory (for example, per terabyte)?

2. How well (practical effectiveness) can it connect to home devices and be used to store information such as photos, texts, documents, videos etc? Related sub-question: how well does its lasting power compare to HDD?

3. Where/how would one buy reliable magnetic tape storage devices?

In case anyone who knows the subject well wouldn't mind answering a bit.


The author's bio "Mark Lantz is manager of the Advanced Tape Technologies at IBM Research Zurich."

LOL, ok sure man, the future of storage is definitely tape. :rolls eyes:


"It should come as no surprise that recent advances in big-data analytics and artificial intelligence have created strong incentives for enterprises to amass information about every measurable aspect of their businesses. And financial regulations now require organizations to keep records for much longer periods than they had to in the past. So companies and institutions of all stripes are holding onto more and more.

Studies show [PDF] that the amount of data being recorded is increasing at 30 to 40 percent per year. At the same time, the capacity of modern hard drives, which are used to store most of this, is increasing at less than half that rate. Fortunately, much of this information doesn’t need to be accessed instantly. And for such things, magnetic tape is the perfect solution.

Seriously? Tape? The very idea may evoke images of reels rotating fitfully next to a bulky mainframe in an old movie like Desk Set or Dr. Strangelove. So, a quick reality check: Tape has never gone away!"


I mean, I really grok tape having detractors. Granted ...

... but, to the point of downvotes? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: