Hacker News new | past | comments | ask | show | jobs | submit login
50TB IBM tape drive more than doubles LTO-9 capacity (blocksandfiles.com)
143 points by voxadam on Aug 25, 2023 | hide | past | favorite | 173 comments



It's weird how tape size just seems to increase steadilyin a smooth curve to roughly match whatever hard drives are doing. It makes me wonder sometimes if they had this tech all along, but it was better just to slow roll it out so they could sell multiple generations of hardware. I guess only the material scientists at these companies know what's really going on.


Exactly.

IBM's had several generations of tape tech in the can, and has managed the rollouts with calm competence for at least back to '97 when I got started in the backup business.

Their research is a prominent part of the LTO platform, too. My observation of that landscape was that they gave toys to LTO at a rate calculated to extinguish the utility of tape research in other places. This means there's a plausible competition, which IBM wins all the time.

This may sound like I'm down on them for it; I'm not. Their hardware was rock solid, performed as advertised.


Huh that's pretty smart if true.


Look at Wikipedia [0], they have it planned for the next ten years like clockwork.

[0] https://en.wikipedia.org/wiki/Linear_Tape-Open#Generations


> Fujifilm has demonstrated a 580TB raw capacity tape using Strontium Ferrite particles so there is a lot of runway for future capacity increases.

I would guess so.


Well, yes, the article mentions the potential of the strontium ferrite, with existing prototypes having hundreds of terabytes of capacity.

Similarly, HAMR and MAMR will be milked for years in the world of magnetic hard drives.


In one way, this is better to have multiple generations because you have to fill the tapes and with new generations we have better transfer speeds. Also with the backward compatibility, old tapes are transferred to new tapes, so the shelf longevity of the tapes are never really tested.



What’s the point of reporting compressed capacity? Isn’t data typically compressed at rest, negating any value of the disk doing it?


After having helped on a lot of tape storage implementations, I get why it's useful for quick estimates on backup solutions.

Compression really is a game changer when you're over a certain amount of data you need to protect, and backup technology does some really clever space savings at basically all stages, making it hard to find a good and easy metric to help sell to management. It's a lot easier to tell management:

"These tapes hold 50 TB of production data, and we have about 45 TB in production. Backups and restores will be faster, we'll have fewer tapes to manage for backups"

than it is to explain the different types of data you need to move to tape, the expected compression ratio, etc.

That being said, I do agree they market it too much and it gets confusing, as it should only be a reference for implementation and quick calculations.


Isn't it harder when you have to later report to management that actually, your 45TB of production data do not fit on the 50TB tape ?

I suppose compression ratios vary a lot between users of these things, right ? And if you are backing up user-generated data, you don't know in advance how well it will compress.


In this case "The latest TS1170 drive supports 50TB JF media cartridges with 150TB compressed capacity through 3:1 compression"

So looks like 50TB raw if I am reading this right.

For these sorts of solutions you are usually not using just 1 tape. The tape is the cheap part. The drives cost a decent amount though. So in your case if the data does not fit. You buy another tape and pop it into the existing drive and finish the backup. Also you would have some sort of rotation schedule and at least a few dozen tapes. If you are doing it right offsite and onsite rotation too. An extra tape in the mix of that is not something they would worry about. When I did this for a small org I had the weekends to do a full backup of all of our computers. Then during the week I would do incrementals. At the end of the week I would ship off the prev week to an offsite and start the next full backup. Not perfect but gave us a onsite 1 day return offsite (building gone) 1 week out. The offsite would keep about 6 months of tapes and return them about once a month.


In the previous century we'd restore the production tape backups to a "production playground" that verified the backup actually worked as well as give a production-like testing environment.


Yeah. For most operations, you pay for drive throughput & redundancy, and the actual storage itself is “too cheap to meter”.

Except when it isn’t, but those customers know who they are.


These are the same customers that only make 1 tape of the data. One facility I was at required 3 copies written by 3 different writers. These were in large robot operated units, so it was all automated by our front end. One copy stayed locally, the other 2 went to 2 different offsite locations in different parts of the country with the closest being in another town across state. Each tape had to be removed from the writer and placed into a different drive to do a read back for comparison. In the LTO4 days, we would catch lots of write errors like this. I wonder if the newer formats are any better about these errors? How I miss the old days of tar/mt commands, er, no, who am I kidding!


Odds are you already have a backup system and so you know from experience with your old system how much compression you really get.


>After having helped on a lot of tape storage implementations, I get why it's useful for quick estimates on backup solutions.

Can't you just do that estimate in your head? If you know your data mostly text and has a 3:1 compression ratio, when you buy your tapes you can multiply the raw capacity by 3, if it's mostly compressed images (i.e. gif, jpg, etc), you're probably not going see much compression at all.

Does going with some tape drive manufacturer's estimate really help you?


> "These tapes hold 50 TB of production data, and we have about 45 TB in production. Backups and restores will be faster, we'll have fewer tapes to manage for backups"

Sure, if "production" means database. Which is sometimes sensible and sometimes extremely far off.

> than it is to explain the different types of data you need to move to tape, the expected compression ratio, etc.

To some extent, but you mostly just need two numbers. Database capacity and photo/video/audio capacity.


Surely how well it compresses depends on the material you're backing up

It seems they went all in on the "compressed" value after missing the doubling of the real space in LTO6.


Especially if you backup encrypted data (which isn't unusual - to have data encrypted at rest) you will have a compression ratio of 100% or even more (metadata of the compression needs to be stored too)


The drives will do encryption too so that you don't have to pick between encryption at rest and compression.


From the SSL world we know that encrypting compressed data can lead to vulnerabilities in the encryption. Does that nog apply here? https://en.wikipedia.org/wiki/CRIME?wprov=sfti1


It doesn't really apply as both CRIME and BREACH rely on small N on the payload size, and heavy use of repeated chosen plaintexts to iterate through the secret state space.


Not once in my days of using tape storage have I ever enabled compression, so it always seems like a waste to my specific use case of storing large files of media formats, specifically video. Unless the video was in some sort of RAW uncompressed format, which is close to never, and video in a CoDec would not benefit from this compression.


Tape compression is probably not meant for your use cases, it's likely meant for large government institutions or corporations that generate petabytes of uncompressed text data a week - so instead of needing to manage compression as a separate process, the tape drive just does it without any other processing needed. The data can go from collection/creation to long-term storage without any other steps in between. For some organizations that's a big win. For the average home user backing up their plex, it might seem unnecessary.


There's still a fair bit of mainframe-format data (zoned decimal, etc.) out there. These are typically in defined-data formats, a/k/a hierarchical , which is sort of a hybrid between data files and text files ("records" are lines, but often with multiple formats within a single file). These can achieve compression ratios ~= 90%.

Standard written text typically runs about 50%.

Compressed audio, image, and video data of course compress very little, if any.


Lots of home users backing up their Plex with an LTO5 or greater are there?


I am one of those home users with an LTO5, but I do not use tape compression and I have no illusions about what it was made for. I write tapes without any compression, because the stuff I write can't be compressed. Tape compression wasn't made for my use case, it was made for organizations that generate petabytes of text data.


   tank                    compressratio  1.14x  -
   tank/backup             compressratio  1.43x  -
   tank/movies             compressratio  1.01x  -
   tank/music              compressratio  1.01x  -
As you can see, my actual file backups (servers and clients) compressed somewhat, but movies and music I got nothing (this is from ZFS but tape compression will be similar).


Back when I still managed an LTO tape library (which was way back when LTO-4 was state of the art), our data was encrypted before writing it, also negating any compression. And back then the compression ratio was advertised as 2:1, not 3:1.


This was before the introduction of the much lauded "Middle Out" compression.


Purely legacy, that's how it's always been reported.


It takes advantage of shallow purchasing decisions.

If all your competitors report 30 TB*, and you report 15 TB, you may have a better product, but what the people with purchasing power want is bigger numbers, and you advertise a lower number

*with 3:1 compression


If so, why not state 100:1 compression on the canonical FF*N data set.


Because contrary to popular belief, people are only finitely foolish

But in fact this did happen to some extent. LTO tapes used to advertise with 2:1 compression ratio. Then that became 2.5:1. And now IBM is doing 3:1

And that's not because we just discovered how to improve on Lempel-Ziv. The switch to 2.5:1 was around 2012. LZMA is 20th century tech, and people still use gzip today. Compression technology doesn't account for the advertising number inflation there.


Because all the companies have to do it in lockstep (see "reach vs standout" recently in tape measures).


Because nobody has yet bothered to sue them for false advertising, so they continue to get away with it.


Bigger number for marketing


> The new tape cartridge media, also called 3592 70F, uses Strontium Ferrite particle technology.

If anyone else is wondering if this is radioactive, it likely isn’t.

> While natural strontium (which is mostly the isotope strontium-88) is stable, the synthetic strontium-90 is radioactive and is one of the most dangerous components of nuclear fallout, as strontium is absorbed by the body in a similar manner to calcium. Natural stable strontium, on the other hand, is not hazardous to health.


Natural strontium not only is not radioactive but it is one of the elements with the most stable nuclei (compared to their neighbors), together with barium, tin and lead, because they have "magic" numbers of neutrons or of protons.

Because of this, strontium is one of the most abundant elements in the universe, among those that are heavier than the elements of the iron peak, i.e. among the elements that cannot be produced by fusion, in stars.

There are various living beings which use either salts of strontium or salts of barium for their skeletons, instead of the more common salts of calcium (because the salts of Sr and Ba, especially the sulfates, are less soluble than the corresponding salts of Ca).

In magnetic tapes, either barium ferrite or strontium ferrite are preferred to ferromagnetic metals that have been used in the past, because they already are oxides, so they cannot be oxidized by the air, which enables long lifetimes for the magnetic tapes, of at least thirty years.


>In magnetic tapes, either barium ferrite or strontium ferrite are preferred to ferromagnetic metals that have been used in the past, because they already are oxides, so they cannot be oxidized by the air, which enables long lifetimes for the magnetic tapes, of at least thirty years.

This was very interesting. I quit paying attention to tape formulations decades ago, but even when paying attention, I only knew of the names vs actually knowing this level of detail.


I second quakeguy, great reply; love seeing material science on HN.


Awesome reply, thx!


In fact, small amounts of strontium seem to be good for your bones

https://www.webmd.com/osteoporosis/strontium-treatment-osteo...


Erm, that webmd article you linked is for strontium, not strontium ferrite -- Sr vs SrO6Fe2O3.


Not saying strontium ferrite is good for you, but remember that when we talk about sodium intake or iron intake or whatever, that includes multiple different forms of those, not including the pure metal. That article actually mentions two different forms of strontium and hints at others.


That's fair and I was curious if the same thing would happen here. But this definitely isn't my domain :)

In any case, here's a lovely FDA safety data sheet of strontium ferrite. It... uh.. apparently doesn't cause pain when rubbed into the eyes of rabbits and guinea pigs. No LD50 tests were taken for the sheet though, so that's bittersweet I guess.

https://www.advancedenergy.com/globalassets/non-resource-lib...


Someone that's worried about radioactivity is talking about the atoms. And so is the webmd article.

Zero mismatch.


that might be because the post they responded to was also talking about elemental strontium:

> While natural strontium (which is mostly the isotope strontium-88) is stable, the synthetic strontium-90 is


If you knew you were going to exposed to nuclear fallout could you ingest a lot of natural strontium to minimize the amount of strontium-90 your body could absorb?


As a kid I asked my dad to buy a tape backup drive so I could do backups of our Win95 machine. I’d run backups religiously, to the chagrin of other members of the household who were unable to use the machine during a backup.

Later at my first job I was responsible for performing tape backups and rotating them. The most recent would go into the fire safe, and the one before that would go with the owner to take home.


I'm still looking for an affordable external tape drive. There's RDX but their tapes seem a lot more expensive per TB.

e.g. LTO 3TB costs 30 EUR RDX 2TB costs 265 EUR


RDX is diskbased, not tape based (that is the D in the name)..

https://buy.hpe.com/us/en/storage/disk-storage-systems/remov...


Same issue for me. I need to archive a bunch of data once or twice, put the tape on a shelf in my parents home and then I don't plan to do anything with it unless my NAS self destructs.

Ideally I would be able to rent a tape drive, even if it costs 100 dollars per day it wouldn't really matter.


I've considered getting an external Blu-ray writer and just a bunch of quality inorganic blank discs as a tape alternative archiving solution.


That's exactly what I've been looking into lately.

Did you have any specific discs in mind?

I recently bought "Sony Blu-ray BD-R XL 128GB", but I couldn't find any info whether its organic or inorganic.


Any external BluRay reader/writer you'd recommend? (which supports disks from 25 GB up to the 128 GB you mention)?


The way I've understood it is that BDs are inorganic as long as you don't buy LTH discs.


Are all LTH discs marked on the package or how do I find out if my Bluray discs are LTH?


It should note somewhere that they're LTH discs since they're fundamentally quite different from normal HTL discs.


You might want to consider M-Discs.


AFAIK, M-Discs don't really make a that much of a difference if you're not dealing with DVDs.


Do you mean in comparison to regular BDXL Blurays?


Basically all non-LTH BDs. The way I've understood it, the big selling point in M-Disc DVDs vs. regular DVDs was the inorganic dye used in it. But if you buy a non-LTH BD, you'll have an inorganic dye anyways.

The company that actually made M-Discs also went bust and there's a lot of conflicting information on the Internet whether or not the current M-Discs are real M-Discs or not.


Define "affordable". Lots of LTO-6 drives available for less than US$ 1000:

* https://www.ebay.com/b/LTO-6-Tape-Data-Cartridge-Drives/3997...

Here's an eight-slot SAS library:

* https://www.ebay.com/itm/385724855188


If you intend for your tapes to fit in the library, 20TB for $1000+$200 does not qualify as affordable.

It's reasonably possible to hit a target of 2x hard drive cost with loose tapes but you need a lot of data.

And trying to match the price of hard drives with $10/TB tapes will require piles of data and piles of tapes.


How rich will I need to be to get one of these for my home storage array? It is hard to find a cloud backup provider that will take 10+TB without freaking out on price.


Without knowing how much storage space you have nor what kind of array you own I can tell you with 99.99% certainty it would be by far much cheaper to just buy the same storage hardware and just copy your data to it.


I get that but I would really like tape backup if it was viable, it's simpler and more compact.


Tape backup isn't simple. Steer clear of it.

It would be faster, easier, and cheaper to backup to a second set of disks and store them at a relative's or friend's place.

Some people keep a copy of their data at their place of work. Personally I'm opposed to that.


Why are you opposed to that? I store an encrypted ZFS volume at work. I dont expect to ever need it but it’s just one of many redundant copy stored around.


At what scale does tape begin to make financial sense?


If you need the following: - Storing petabytes of data without requiring any power. Hard-drives are cheap but each one will require quite a bit of power

- Having the data be offline, ie, not available to an attacker which somehow managed to take control over your network

- Having the data be in a physical format which can be easily put in a safe or in the back of a car when there is a risk of flood or fire

- Having data on a media that won't regrade over a period of say, 10-30 years. Hard-drives tend to fail


What stops you from writing data to a spinning disk HDD and then unplugging it and putting it in a safe?


Mechanical failure [0]. As perbu found out, when moving parts are left to sit for years, things can fail. Lube can dry up, causing bearings to seize up, actuators to lose precision or get completely stuck [1], and the magnetic field degrades by 1% per year [2]. Of course, you could get lucky w/r/t mechanical issues, as many retro computer enthusiasts can attest, but it seems to involve a lot of luck, as it only takes tiny defects to cause problems over a long enough time-frame that leave half your drives as paperweights while the others keep on chugging.

Both of these issues can be largely mitigated by simply plugging the HDDs in every year or so and re-writing the data (maybe run badblocks while you're at it to ensure the drive is still in good health), but, as throw0101a mentioned, this takes time and effort.

[0]: https://serverfault.com/a/51893

[1]: https://www.partitionwizard.com/clone-disk/do-hard-drives-go...

[2]: https://superuser.com/a/312764


Is there a way to force a file system to recommit its data?


Seems like you could do it by not even mounting the filesystem, just read out raw sectors and write them back. Like a 3 line program. Maybe if you want to be fancy you compare the sectors to some checksums to make sure you're not reading corrupt data.


Any word on SSDs?


SSDs may actually worse than HDDs, depending on what NAND chips are in use. Although there are no moving parts to seize up, they do require an occasional power up because they represent data via voltage levels and we have not yet found a perfect insulator (meaning they will inevitably leak charge), and with QLC being all the rage these days (4 bits represented in each cell using one of 16 discrete voltage levels), when the voltage levels start to degrade, it could easily cause catastrophic bit rot.

How long a given SSD will last before losing data depends on the chips it is built with. I remember reading that Sandisk rates their flash storage as being able to survive a year without power (though I read that years ago and cannot find a source for it now), though [0] says 2-5 years, so technology has likely advanced since I last read about this. Either way, you should probably try and power up your SSDs at least once a year to keep the cells charged. This is less work than to keep an HDD healthy for cold storage, but it certainly isn't "set and forget" like tapes or optical disks.

Edit: I remember pulling my first SSD, a 60 GB Intel 330, from an old laptop and, despite having not been powered on for 3 years, it still booted. But that was MLC/DLC [1], which only has 4 voltage states, and I only saw that it booted, not that all my old data was intact.

[0]: https://www.easeus.com/resource/does-ssd-need-power.html

[1]: https://www.intel.com/content/dam/www/public/us/en/documents...


Hm. Makes me more sad about the state of zfs in macOS and windows - with native encryption and checksums it makes for a nice cross-platform fs for external drives...


> What stops you from writing data to a spinning disk HDD and then unplugging it and putting it in a safe?

Manual labour.

There are tape libraries with robots that can take a tape out of a slot, put it in a drive, and then put it back automatically without humans needing to do anything. If you want to move the tapes offsite, there is some manual work to move the tapes from the import/export slot to a container.

A large library like the StorageTek SL8500 can have up to 640 drives, 100'000 tape storage slots, and 2'880 import/export slots.


> Manual labour.

Big cloud providers have disk replacement robots, so it's not unfeasible for robots to perform the "unplug and store the HDD somewhere".

But another advantage tape has is durability.


> Big cloud providers have disk replacement robots […]

[citation needed]


https://twitter.com/alibaba_cloud/status/1326078390105825281

I have no idea if anything like this is actually in use, I have my doubts. Also, a system like this probably isn't terribly useful, a Cartesian gantry robot would be much faster and way more robust than an articulated arm strapped to a mobile robot platform.


Who actually has those?


Last time I saw one of those was back when AOL was a big shot ISP. They had to buy high power SGI systems to feed the things because nothing else had the bus bandwidth to support one. I was told that the full rack sized SGI system (Origin 2000) was the cheapest part of the entire rig.


> A large library like the StorageTek SL8500 can have up to 640 drives, 100'000 tape storage slots, and 2'880 import/export slots.

Wow the scale is just immense, that's pretty cool!


Typical reliable offline HDD storage is 2-3 years. For tape it's 20-30 years.


I've got a bunch of video on a harddrive that I put in storage 20 years ago. Tried to spin it up again and it just makes funny noises.

So, yeah, spinning rust might rust.


its possible to restore it in a data restoration shop, if you are interested


This is what I personally do. I have small needs when it comes to backups (< 1TB), so I have one backup that's just a regular HDD stored in a drawer, and another one that's on a Scaleway cold storage container.


The spinning motor might not start after some years of inactivity.


Bitrot.


I agree with most of that, but taking a server offline isn't really harder than pulling tapes out of a library.

And if you replace dead disks every month or two, a non-networked server can probably manage more reliability over 10-30 years.


Think about tape as an archiving tool, rather than a backup-and-restore tool.

(The three major tiers are A: I accidentally deleted a file or directory yesterday, so I need to get a copy from snapshots; B: the machine I was working on exploded, so I need to rebuild it and repopulate it with my data; C: archive for some need in the far future)

Suppose you have large quantities of data that you need to preserve for legal reasons -- it's always legal reasons, unless you are in academia or similar -- but you will not be accessing much. That's when you use tapes.

For most other purposes, you want a snapshotting filesystem or an offsite filesystem made from those snapshots.


I think there's a case to be made for a subset of C: archiving video originals, film scans, and the like. While the humans might work with intermediates ("mezzanine") files, the originals are just miserably huge, but you never want to lose them.


Exactly as Gavin from The Slow Mo Guys describes https://www.youtube.com/watch?v=lO-SAzFaN18


I use it at home for offline backup. You can get LTO-6 pretty cheap and big chunk of my backup is just things I want to keep indefinitely. You can put tapes in multiple locations easily and not worry for 20 years.

If you just want to store lots of data in a single place I think MAID still beats the tape at any scale (especially if you want automated retrieval)


Is there an LTO-6 tape drive that is, say, less than a thousand dollars? All tape drives I've seen are between 2-4k USD.


I bought an used one (HP Ultirum 6250), you should be able to get it under $1000, closer to $500, but I recommend getting it with the controller so that you know there are no issues. Note that these controllers are made to be mounted inside heavily ventilated rack case, so if you are putting it inside desktop PC I suggest making sure some fan blows on it.

It's discouraging that it's so complicated which made me back out of the purchase a few times, but it was well worth it.


Used, yes.


Once you fill shelves with them.

The tape itself is quite cheap (LTO-9 is 18TB for $100), but the drives are expensive ($5000+ for an LTO-9 drive), so you need quite a bit of volume to make it worth it.

You can get an earlier breakeven point by buying used equipment for older generations. But no matter what you do, at small scale hard drives will be cheaper (and can be put on a shelf or in a safe too)


Depends on how much data you generate. Even small game studios will use them for regular physical offsite backups for potential disaster recovery. (In extreme cases we stored regular snapshots of dev machines as vmware images).

Essential when data is created faster than the locally available internet bandwidth which in media industries is quite often.


I am using tape for archiving my data and the break-even point with hard disks, the last time when I have checked, perhaps one year ago, was somewhere between 200 TB and 300 TB of stored data, after buying a tape drive for about $3000.

For larger amounts of data, the money savings vs. HDDs increase quickly.

The break-even point vs. HDDs is lower for long-term archiving than for short-term backups, because the lifetime of tapes is much longer (it is limited not by the degradation of the tapes but only by the risk that compatible tape drives might become hard to obtain), so you need multiple HDDs to store data for the lifetime of a single tape.


Above 300TB it starts to make sense to buy an LTO-9 drive ($4,500) and tapes ($120 each for 18TB)

Right now you can get 18TB drives for $290 so you can get 306TB for $4,930


We've got a 9-drive system with a table robot between two aisles of tapes, the whole thing is maybe 8" by 20". It definitely makes sense at that scale.


when you need long term stable storage without getting hit by bit rot


Pretty sure it's not immune to bit rot.

Have any data to back your claim?


Tape drives will, when writing, re-read the data as it is written to make sure no errors are written and will transparently correct any errors. In addition, quite a bit of error correcting code is written, making it possible to correct simple errors.

So perhaps not 100% immune, but partial immunity to bit-rot is a pretty important part of the LTO spec.


Immune, no. Far less prone to bit rot? yeah. My source is running backups for a significant fraction of a large univiersity for ~20 years. We had a routine cadence of tape errors, which were much less frequent than our routine cadence of disk replacement.

I suppose that's anecdote. :) But it's what I've got.


> Have any data to back your claim?

Search for "year" in the specification of HPE's Ultrium LTO media:

* https://www.hpe.com/psnow/doc/c04154430

Or IBM's white paper:

* https://www.ibm.com/downloads/cas/Z3RYV0BR


So 30 years for HPE when stored in appropriate himidity/temperature range.


Quite large scale. Last time I priced tapes out they were more expensive per TB than hard drives.


If it didn't (for someone) then this product would not exist.


I wish the tape hardware itself wasn't so pricey! would love to have local backups our our colo servers.. alas we pipe everything to "cloud" storage (remembering that the cloud is just someone else's server in Virgina)


I am so tired of the "someone else's server" trope as if there is no value add.

S3 is a fantastic piece of architecture and service and the fact that the common dev has such tools and service available instantly from their personal laptop or even phone is mind boggling.

We are very fortunate that AWS led more services being made available to common folk than what had been done in the past.

Its not all about cost or hardware ownership--the value these near infinitely scalable services is easy to ignore but astounding at the grass roots level.


As someone who constantly scrapes and indexes every bit in every public S3 bucket I can find, for purposes of selling query access to my indexes, I couldn't agree more - FANTASTIC value add.

Of course, I never have, don't now, and never would run a business on someone else's computer though... all the hyperscalers will happily charge you more for 3 months worth of compute than the cost of the hardware the compute is being done on. Would I rather build a server for $5k or pay $1600/mo to rent that exact same server? Such a difficult decision! /s


> all the hyperscalers will happily charge you more for 3 months worth of compute than the cost of the hardware the compute is being done on. Would I rather build a server for $5k or pay $1600/mo to rent that exact same server? Such a difficult decision! /s

The benefit is that you don't have to buy the entire server in order to use a portion of the resources on it. If you're going to use the entire server, buy the entire server — if you're only going to use a portion of a server, why buy an entire server?


> if you're only going to use a portion of a server, why buy an entire server?

Unless your compute needs fall below those of an Rpi Zero, it is almost always cheaper to buy physical hardware once that doesn't have a monthly bill attached.

Sure, scalability is a real value proposition from the hyperscalers that isn't the same with physical hardware, but is scalability worth paying 10x+ in OpEx for what could've been 1x in CapEx? For most businesses, I have a hard time making financial sense of that one.


Do you sign long term lease agreements for colo space, power and bandwidth for the hardware or are you still paying that annually/monthly?

There are no free lunches. I do not disagree buying hardware is cheaper but operational overhead and opportunity cost are real.


Eh, buying hardware isn't necessarily cheaper if you factor in operational costs like power, cooling, and especially internet (which gets very expensive very quickly for servers). Datacenters get to spread this cost out over many servers, small installations don't.


As a business owner, I don't need a whole data center worth of electricity and internet.


No, but as an example, if you want more than 30Mbps upload, it can cost you hundreds of dollars per month depending on location. (Depending on location.)


> if you're only going to use a portion of a server

only if your data usage decreases over time.

There's a break even point, and that break even point is closer than you think. But by then, you're locked in. It is also the same reason why they are able to charge you more money than a self-setup storage system, since once you're locked in, the cost of switching is almost always just slightly higher than staying.


Do you feel like that practice is unethical at all?


I feel like companies collecting sensitive information and slapping it on someone else's computer without the slightest clue how to protect it, or running a cloud hyperscaler and leaving storage open and indexed by default is far more unethical than anything I'm doing. I'm not the unethical collector, I'm not the one who decided to host that collected info on the cloud, and I'm not the CSP who gave those unethical collectors shitty security defaults. I'm just the messenger notifying others that this is happening.

US case law is abundantly clear that web scraping is 100% legal. Further, I have never signed any agreement with AWS (I don't even have an AWS account) regarding AUP or anything like that, so I have no contractual restrictions on this either.


Probing for open buckets and selling the results for people likely looking to do nasty things with the is unethical IMO, but it's still more ethical to do it and then bring it up to warn others of it, than to just do it. I mean, how do you know if the people talking about this fantastic value add at the fingertips of anyone are not doing it?

When it comes to doing basic napkin math and publish the results, that I find highly ethical, actually.


> it's still more ethical to do it and then bring it up to warn others of it, than to just do it.

ala, i was gonna murder you, but at least you were warned!


It took me 3 secs to find this: https://buckets.grayhatwarfare.com/

So I think the practical difference is mostly between praising S3 (as synonym for "the cloud" when Amazon wasn't even mentioned, no less) and warning at least to use it diligently, which when it comes to costs might often be not using it at all.


The context of the conversation is on tape drives which is about mass storage.

The comment I replied to said:

> someone else's server in Virgina

While everyone may operate there now it was exceedingly common in mu experience to be referring to AWS when referencing Dulles / Ashburn / NoVA or just Virginia since AWS started there first.

So I figure S3 is the correct comparison to not having a 50 TB tape drive under your desk when "the cloud" is being discussed.


have you considered that maybe the hyperscalers and other massive cloud storage solutions (apple, comcast, AT&T) leverage tapes in this manner? I suspect if you're using something like AWS Glacier, you're using "someone else's robotic tape library"


The problem with tape drives for backups is that they only just outpace the sizes of hard drives over the years and are multiples of hard drive prices.

Most of us have used tape drives in the past for backups only to find that hard drives work better, are more reliable overall, can be accessed randomly, and are cheaper.

Imagine using a 2Gig DAT tape for backups these days, as I did do way back in the past. Or a 60MB QIC cartridge, as I did do a few years earlier than that DAT tape drive.


> The TS1170 drive operates at 400MBps raw throughput

Am I reading this right? At that throughput it would take about a year to fill a compressed tape (assuming 3:1 ratio is achievable).

Or is there some parallelism involved? (i.e. many tracks written in parallel to the tape, I don't know a lot about tape drives)


I don't think that is correct. 50tb / 400mb/s = 1.45 days


You're right, I must have (seriously) messed some decimal.

(50e12 bytes / (400e6 (bytes/second))) = 125000 seconds ~= 35 hours ~= 1.45 days


Your math is wrong.

(50 terabytes) / (400 megabytes) = 125000


125,000 seconds = 34.72 hours


So the entire Spotify library can be stored on a single tape! What interesting times we live in.


Kinda wish they would pack this technology into a consumer version in Compact Cassette format.


I wonder what the world would look like if the Pereos tape drive had caught on and then been steadily improved. It was doing 1GB/tape (500MB/side) in 1994, on a tape that perches cozily on a fingertip, and theoretically should've been easy enough to build into all sorts of consumer electronics.

Sony made digital cameras that recorded to floppy disk (the Digital Magnetic Video Camera, or MaViCa), and Agfa made one that recorded to PocketZip, the 40MB shrink of Zip disks. Both random-access media, but that isn't crucial for a digital camera. Could easily record to linear media, and in fact a later Digital Mavica did just that, onto CD-R discs. So recording to tape wouldn't be a stretch at all.

Instead of deleting photos on the camera, you could perhaps mark them for deletion but then have that take effect at offload time. Then blank the tape and use it again. At a time when 64MB flash cards were barely affordable, a 500MB tape would've been a gamechanger.

I use cameras as an example because they're probably the use-case that benefits most from a capacious writable medium in a portable format.


With 3x compression and 900 MB/s it only takes 46 hours to fill one. But maybe I’m bad at math. Perhaps more importantly it will take that long to do a full recovery from one.


That's nearly 1 GB/s. ~18 minutes per terabyte. What would you be doing a full recovery of on 150 terabytes of data? This type of storage is usually for archive data not for quick restoration of active servers or databases.


Btw what does 3x compression mean in this case? Does the device itself do compression and if so how can it be a fixed rate, given the data is arbitrary?


LTO drives include compression and manufactures quota both the guaranteed/native capacity/speed as well as the "compressed" capacity/speed. In they case of IBM they always use a 3:1 ratio in their specs.

Of course in reality it depends on the data you feed it so the ratio has always seemed odd to me but it is what it is.


Yes, the compression is on-device, and it's not a fixed rate, it's a peak. Efficiently-stored data like video compresses less well or not at all.


400MBps transfer speed means it would take 1 day, 14hours to fill it...


That seems pretty good honestly for its intended use case.


Yeah that seems surprisingly good as I thought the write speeds I get from a spinning disk is around 100MB/s or less.


100MB/s or less? Are you using SMR 5400rpm disks? I routinely get 180-220MB/s throughput per Exos drive.


I just ran -

    dd if=/dev/zero of=tmp.img bs=1G count=1 oflag=dsync
And got - 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.49586 s, 113 MB/s

The drive appears to be - WDC WD160EDGZ-11B2DA0, it's a 16TB disk, which I thought would be SMR, due to its size, but maybe I'm wrong.

Just found https://www.reddit.com/r/DataHoarder/comments/qf5ar3/is_this... which seems to indicate it's CMR.


> oflag=dsync

This will reduce performance


That's in the same ballpark as HDDs. It takes me ~24 hours to fill an 18 TB HDD.


tape libraries (even small) can have multiple drives in same 6U chassis. you can link multiple chassis together to build bigger system.


Since SATA/NL-SAS eaten up the lower tier of the enterprise HDDs you never write to the tape directly, it's always D2D2T.


thats great for tape

only marginally slower than your average SSD


Wow. Prosumer storage technology has rocketed past even enterprise backup technology.


TIL tape drives still exist.


I used to do consulting work for storage vendors and anytime someone would imply a technology was legacy or on the way out they'd bring up how tape and mainframe still made money so nothing old ever dies... To be fair, I suspect some very cold tier cloud storage services have tape behind them.


I used to work in the bulk storage industry and the running joke from 2005 until 2018 when I left was "surely this is the year tape will become obsolete".

Cold tier cloud storage absolutely has tape behind it. You can even send tapes to some cloud vendors.

Broadcasting is another giant customer. Companies like WBD have Exabytes or Zettabytes of digitized programming (movies, series, documentaries) which would be impractical to store on disk. The typical workflow is for the program scheduler to direct the robot to fetch the tape containing a program 24 hours in advance of the broadcast time and copy it to disk, then erasing the disk copy after broadcast.


Not only exist, but sell at higher quantities then they ever had. When you need near absolute faith in archived data there's not much else that competes. Longer expected life then drives by close to an order of magnitude.


We backup 1.5 PB of data per month to tape; What alternative solution would you suggest given 'cloud' in all its guises is too expensive?


I'm surprised HDD don't turn out to be less expensive option calculating in all infrastructure + man hours. Do you have a ballpark what's percentage cost difference for your use case?


Worth noting that many uses for hard drives are IO limited. Ie. you're running a database and keep a few hundred GB of hot records in RAM, tens of TB of medium records on SSD, and a few PB of rarely accessed records on hard drives.

However, even those rarely accessed records often have more accesses per year than the hard drives can support - so you are forced to buy 400 drives instead of 200 drives and keep each half full.

End result: There is a lot of 'almost free' space on the hard drives, as long as you are happy to only read/write the data very rarely.


If you're having such an IO bottleneck that you're doubling your HDD array to provide the necessary throughput how do you actually run backups or restores using those disks? Do you hope you only need it in off hours?


At the PB scale you start running robotic tape machines that eliminate much of the human labor involved.


How much is 1.5PB of tapes, and 1.5PB of hard drives?


I would hazard a guess that storing tapes takes up significantly less physical space than hard drives.

I don't think hard drive based storage and tape bases storage have quite the same use case.


I would hazard a guess that storing tapes takes up significantly less physical space than hard drives.

An 18TB tape is basically the same size as an 18TB 3.5" hard drive.


I doubt that this can be true.

I am using older LTO-7 cartridges, with a raw capacity of 6 TB. The newer LTO cartridges must be compatible mechanically with the old cartridges, because the same drive must be able to read and write them.

The LTO cartridges are smaller than 3.5" drives, they have maybe about 2/3 of the volume of a 3.5" HDD.

Moreover, a tape cartridge is many times lighter than a HDD. I have suitcases with 20 tape cartridges that are as convenient to carry as any typical suitcase.

A luggage carrying 20 HDDs would have a very noticeable weight, and its volume would be great not only due to the bigger HDDs, but it would require serious padding to avoid HDD destruction if the case were dropped accidentally.


OK, you made me do the math. LTO tapes are 225 cm^3 and 3.5" HDDs are 245 cm^3 so I stand corrected. The weight point is a really good one as well as a HDD is 3 times the weight of a tape.


About $8000 of LTO-9 tape, $21000 of Toshiba MG10 hard drives, $38000 of Seagate IronWolf Pro NAS hard drives.


Tapes are roughly 3 times cheaper pr. TB than hard drives.


18TB(raw) LTO-9 tapes go for US$ 105:

* https://tapeandmedia.com/lto-ultrium-9-tapes/

If your data is compressible then you can of course store more. The drives themselves use Streaming Lossless Data Compression, but of course you can send them things like gzip, JPEG, MPEG files which are already compressed; SLDC:

* https://www.ecma-international.org/wp-content/uploads/ECMA-3...

LTO-10 is supposed to have 36TB raw and is estimated for perhaps 2024.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: