My quick TL;DR for other readers (please let me know if I'm wrong):
The systems described are servers that are over-provisioned with an order of magnitude more hard drives than you would typically see in a standard server. This allows a much higher ratio of total hard drive space to other components and cooling and power requirements, but at the cost of only being able to spin up some small fraction of the hard drives simultaneously. It then becomes a complicated scheduling problem of optimizing which hard drives to spin up and when based on the total workload of all the data you are trying to read and write at a given time. This system makes sense for data that is expected to be accessed on the order of once per year or less.
When I was running exchange systems, our biggest challenge was delivering IOPS. We had to use SAN, and wasted significant storage because we'd spend our IOPS budget at 40-60% storage capacity.
I figured at their scale they would have similar problems.
He meant that if EBS has the same issue as his Exchange servers. To explain in more detail: You have 10TB disk space with 10.000 IOPS, your users buy 4TB with 10.000 IOPS then you have 6TB of storage wasted.
If Amazon has that problem with EBS, then selling that storage capacity as Glacier and using just the idle IOPS (or leaving a small bit reserved) allows them to sell capacity that would otherwise just be useless.
That's the point. They aren't trying to sell IO with glacier, since they've already saturated that with EBS. They just want to sell the spare storage capacity, ideally in a write once read never use case. That way they can achieve 100% utilization out of the drives.
So if you use a lot of IO with Glacier, they are going to charge you like crazy, since you're potentially impacting EBS customers.
I'm that guy. I should update the post; Amazon "fixed" the retrieval fees in late 2016 and I would've paid less than a dollar had the current pricing scheme been in effect when I did the retrieval.
With exchange, we had all of this expensive, reliable SAN storage that would be perfect for a low requirement glacier like solution. Unfortunately, we lacked the ops mojo to pull it off.
Archive is not about iops. its about streaming bandwidth.
for example I used to look after a quantum iScaler 24 drive robot, each drive was capable of kicking out ~100 megabytes a second. It was more than capable of saturating a 40 gig pipe.
However random IO was shite, it could take up to 20 minutes to get to random file. (Each tape is stored in a caddy of (from memory) 10 tapes, There is contention on the drives, and then spooling to the right place on the tape.)
Email is essentially random IO on a long tale. So, unless your users want a 20 minute delay in accessing last year's emails, I doubt its the right fit.
The same applies to Optical disk packs (although the spool time is much less.)
I think that's the point - the e-mail is using up all of the IOPs. There would be a small amount of IOPs left over that could deal with streaming data. The data is unlikely to be accessed on a regular basis. The data not used by e-mail would then be used for the archive - data that's pretty much write-only.
Why do you care how fast you can read it back when you're storing it for regulatory purposes? Isn't that a sunk cost? Buy high capacity, high reliability and don't care for the read speed?
With SANS, the iops budget is a function of your hardware config. If you want more IOPS, you get more RAM/SSD involved. More importantly, Amazon gets to sell EBS on their terms: a specific amount of IOPS with a specific amount of storage. If you want more IOPS, you have to buy more EBS. The "wasted storage" you're thinking of would be on your instance using EBS, not EBS itself.
Using BDXL seems like a pretty good solution. Most of this data is archival and existing data is very unlikely to change. You can use HDD/SSD as a buffer as users upload data, and then optimize the packing to ensure you're using all available space on a disk. Possibly encrypt each user's data block on the disk. The system itself would only need to track metadata (file metadata, cartridge/disk, key). Deleting a file would be deleting the key and marking the file as inactive. Once/if a cartridge is marked as completely deleted, can just recycle it.
Could you provide me some insight into why optical storage is a better solution than standard HDDs? Is it just the cost, or is cooling / form-factor a big part of it?
Why isn't anyone thinking tapes? You can get LTO 7 tapes for $0.008 per Gigabyte that allow 100-300 writes before the tape should be destroyed. Quantum and HP make monstrous tape libraries that hold 5-10 petabytes per rack. You can also cartridge-ize your library for even more dense storage on a literal warehouse rack somewhere.
Tapes also match the slow retrieval speeds as you have to read the data out onto a drive linearly.
This is an extremely interesting deductive analysis. However, considering it is amazon, there always exists that persistent "other" possibility: they're purposefully taking a loss.
Or at any rate, pricing based on net present value of archived data given decreasing costs over time for storage.
I do find it rather fascinating that AWS has managed to keep the technology used by Glacier, even at a high-level (i.e. disks vs. tape vs. optical), so under wraps. My personal guess is that it's powered-down disk drives on the grounds that's the simplest long-term solution but that's purely a guess.
This is one reason to assume it's unremarkable. If Amazon were buying and offloading truckloads of BDXL disks and drives, someone would eventually notice. A good explanation is, therefore, that the technology is unremarkable and boring.
> Google claims coldline access within miliseconds
Well, that basically tells you almost all you need to know. It's disk in JBODs. The only question is SMR vs conventional. Anyone who knows that can't tell you in public.
Amazon's latency is measured in hours, whatever process they use involves literally cold storage, either disks that are completely switched off, or some tape-like media archive system, but the article makes a good case for why it's almost certainly not tape
It could still be normal disks, if we're feeling conspiratorial. It's easy to make faster storage slower; just add waits. Or maybe they make it take up to five hours so they can avoid peak traffic times in whatever data center your info happens to be located in.
I've got a USB3 BDXL writer attached at my desk and it is quite handy and not too expensive. I back up my whole data (work) partition to it every so often and occasionally take one over to a relative's house as my own home-grown "glacier" system.
And optical media is also immutable (read only) when you need it to be immune to any data changes.
I find that read only point in time backups gain value over time. Especially if you need to pull a file that would have been long rotated and replaced by newer backups on read write media (eg. HDDs).
Unfortunately the market for this use case is not large and this is greatly reflected in the prices and relatively hard to source high quality optical media. For BD this would be (inorganic) HTL Panasonic media which only has a market inside Japan itself. M-Disc is the other alternative although it only has proven itself within the DVD market, as classical HTL BD media is expected to be very similar in endurance to what M-Disc has on offer in the BD range.
Would you not be better off to upload it somewhere instead? Assuming your relative lives close enough, a natural disaster would destroy all your backups.
Ex-Glacier engineer... and no I'm not going to tell you what or how it's done. NDAs and all that jazz. These speculation threads always make for fascinating reading for people on the team.
Glacier in particular seems to attract the speculative fascination. Do people not realise the name is not in jest, it really is done with graphene-doped room-temperature ice crystals and laser interference lithography?
I liked the story that the truck-mounted Snowball came from Glacier tech. Amazon has been putting data on a truck and then driving it around Virginia. The delay in reading it back is the time it takes for the truck to arrive at a datacenter and plug in. :)
That seems to have been the major stumbling block with higher capacity optical media, that one can't do the drag and drop writes that one have with spinning rust and flash chips.
What if they are using 2.5 inch 5TB drives like this http://www.theverge.com/circuitbreaker/2016/11/15/13642078/s... I use. They are nice as we can plug them into a 15 port USB stick, they auto power down when not in used. Amazon could have developed a box like what backblaze.com has done.
You are mistaken here. Tapes, unless stored with extreme precision, usually last ~20 years safely. They tend to have a longer life, but not without a high risk of deterioration.
In my experience, written CD and DVD only lasts for <10 years, if you're lucky. However studies show you can get 30-45, even 45+ years out of them.
Most Blu-Ray expectancy exceeds this due to the different non organic dye based layering and coating.
The one mentioned above, M-Disc was something developed for DARPA and supposed to last 1000 years in theory.
I seriously doubt it. If I have many TB of stuff I need to have saved off site, you can bet the very first thing I'm going to do is compress the hell out of it. So they certainly wouldn't be able to count on compressing it further.
SSD isn't quite ready for backup use cases (and certainly wasn't when Glacier was built a few years ago).
However, is 2017, so we can say for sure that the extrapolation to 2016 in the linked article from 2010 was pretty good. It is too optimistic by a factor of ~2 in density, and ~10x in cost, but is spot-on even compared to most predictions from a year or two ago.
Since any whacko can claim they made a prediction in 2010, I double checked:
The first comment on TFA says as much.
Edit: This is the actual Pelican paper: https://www.microsoft.com/en-us/research/wp-content/uploads/...