Hacker News new | past | comments | ask | show | jobs | submit login
AWS Storage Gateway (amazon.com)
103 points by bpuvanathasan on Jan 25, 2012 | hide | past | favorite | 31 comments





I don't have a personal need for this right now, but the fact that it stores the data as EBS volumes is pretty cool. I could imagine having local servers automatically mirrored so that they could be failed over to instances on ec2. Very powerful stuff indeed.


Note that the mirroring is asynchronous -- if your local server fails you can replace it with an EC2 instance, but you lose anything since the last snapshot.


Not since the last snapshot. It syncs as close to real-time as possible without the latency and network speeds affecting the performance locally.

This solution isn't designed to replace fault-tolerance on local hardware. It is for close to realtime offsite replication and backup.

Data in S3 is stored in at least three geographically separate locations and snapshots are very fast and very efficient on storage space.

The final major advantage you get through a solution such as this is that if you do have your primary site go down (floods, tornados, etc), you can bring up all your existing images via EC2 without having to have a bunch of redundant hardware sitting around waiting for disaster to strike.

And what do you pay for this? $125/month plus a per GB storage cost CHEAPER than enterprise storage generally is.


Let's say I need 50TB of usable space. I can purchase an Equallogic PS6100E (a mid-level "enterprise" storage device) with twenty-four 3TB drives and 5 years of support for $85,000. Rack space and power isn't free, so let's say the total cost of equipment and facilities over 5 years is $100,000.

In contrast, storing 50TB for 5 years on S3's reduced redundancy storage would cost $250,000. If I ever need to transfer any of that data back to my data center, there'll be a hefty bill for that as well.


Except to get the same redundancy as this AWS product you would need a replica of this storage device in at least 2 other datacentres, you would have to ensure you have enough spare capacity lying around for any snapshots you take and have to buy or build software that manages this 3-way data synchronisation between sites.

Also, you seem to be forgetting that you still need local storage for your images anyway. This is a hands-off backup and disaster recovery product.

For full disclosure, the Storage Gateway's pricing isn't the same as S3's at the moment. They only have one storage tier and it is $0.14 per GB, no discounts. Therefore, 50TB of storage over 5 years is $420k.

Having said that, what would it cost to:

* Not only have your primary but 2 x secondary PS61000E's.

* 2 extra datacentres with connectivity to themselvse and your primary site.

* Software to manage asynchronous streaming of data from your primary to 2 x secondaries.

* Software to take consistent backups of these images and store in at all three locations.

* Software to ensure that your secondary sites contain only encrypted data.

* Cold-spare hardware at a secondary site capable of running all your images.


With Amazon you could facilitate a DR onto their other platforms. With your Equallogic example you would need to mirror that data to another device with recovery targets in another location.


At the scales he's talking about, 10gbps uplinks are easily available. That said, the killer isn't the equipment costs, it's the people.


If I read it correctly, the server can fail but because the data is stored via iSCSI, it will not actually be lost (unless it is the appliance itself that fails)


Sure. But if we're ignoring the case where the appliance doesn't fail it's no better than a drobo.


the storage for the appliance can be outside the appliance (DAS or SAN storage). the appliance can die, but you just plug in a new appliance and point at the storage, and your back... as for the drobo comment, this should be faster than a drobo... and even ignoring the appliance fail, the off site backup is key.


So you are basically paying 125$+ to have S3 cached locally and get an iSCSI interface ?


Which is worth it.

The price of enterprise backup solutions is crazy.


Note that the $125 "gateway" fee is minor compared with the storage/transfer fees themselves - $0.14/GB/month for storage, for example. At that rate, a small 5 TB volume is already $700/month.


This should make EMC sit up and take notice. Amazon is doing great. I would have expected dropbox to do this after icloud was released. They should atleast mimic this now.


Can someone give a brief explanation of what this service is good for?


ok, couple of things to start with:

1: iSCSI can be seen as a USB drive over a network. you plug in, your machine sees it as a drive, you write data. you can unplug, and then plug in somewhere else, and as long as the file system is readable on the new machine, you can get your files. 2: the appliance AWS are offering gives you iSCSI volumes, backed by DAS or SAN storage locally, but also backed by S3 storage in AWS.

So, basically, its like having a drive, automagically backed up to S3, but S3 does not need to know anything about what is on the drive. it could be VMs, Videos, your mail server... anything really.


It sounds to me like enterprisey dropbox: It backs up your files from your local file server to the aws cloud, and if your server dies, bung in a new one and all your files will reappear. Great idea, although maybe something you could already do with dropbox?


As far as I understood the product page and Werner Vogels' blog post[1] it does not run on file level, but on filesystem level. So you won't be able to access single files from within S3, but rather have whole disk images stored in Amazon S3, ready to be restored back to your local datacenter or to be mounted on EC2 servers.

1: http://www.allthingsdistributed.com/2012/01/The-AWS-Storage-...


Actually even lower level than that. iSCSI just makes a block device available over a TCP/IP network. You can use that block device however you like, write random data to it, partition it in to multiple volumes, use it as part of a volume group or disk pool, etc. The individual block device doesn't need to have a filesystem on it.


Except you can use it to power on EC2 machines with those data, so you can use it for dealing with requests spikes, for offsite data elaboration or even Disaster Recovery of your onpremises infrastructure on Amazon.


From my brief reading of Werner's blog post, it seems that this is very similar to what Nasuni have been selling.


This is really cool. Having just done a large and complicated S3 integration into a legacy soup of 20 year old filesystem based document management kludge, this would have made life much easier (and considerably cheaper!).

Not only that, it solves a lot of problems such as dangerously storing backup snapshots on-site, archival and easy deployment and access to S3's CDN functionality.

Sold as far as I am concerned!

(Yes I know it's expensive, but it's cheaper than buying something in-house and employing another hairless monkey to manage it).


Sounds like a great idea. I can't wait until they release Gateway-Cached Volumes myself as it better suits my use-case.


the volumes are cached... from reading the post, you write synchronously to the iSCSI device, which is Asynchronously sent to S3


I'm not sure if that's what kondro meant but what I would like to see is a virtually unlimited size ("elastic") volume where the local disk acts only as the cache.

This would make a wide range of big-storage use-cases ridiculously trivial - those where only ~10% of the data-set is frequently accessed.

I.e. one could lazily scale the expensive local storage with throughput-demand, while the S3 backing store takes care of the long-tail (which can easily be many terabytes long when you're dealing with media files).


He's referring to the upcoming feature in the second paragraph at http://aws.amazon.com/storagegateway/faqs/#How_Storage_Gatew...


Let's say you expect to grow to 20TB of data. Storing that for one month on S3 costs $2,560 (standard) or $1700 (reduced redundancy). In contrast, a Dell R515 with twelve 2TB drives costs $7,000. In a year that's one-third to one-quarter the price of using S3.

Implementing a tiered storage system yourself is pretty complex. Using this S3 gateway might be simpler, but it's not trivial (e.g. you'll need VMware ESXi just to get started).


Well, I only glanced at their current offering, missed the VMware part. My request was mostly wishful thinking.

I.e. instead of VMware it'd be more useful for us to hook in with a FUSE-layer or a patched variant of a filesystem such as GlusterFS.

You're of course correct about the pricing. Their current prices cover some middle-ground but would need to be discounted to make it feasible for larger deployments. However, at the low-end (your 20T figure) the price seems already justifiable when you factor in staff and infrastructure costs (rack+power alone make up for half of the difference).


ok, when you get to the TB of storage part, things get cheaper running in house, but couple of notes:

1: the dell at $7k does not include power, and your 12 2Tb drives gives you 20Tb usuable with RAID6 (loosing 2 drives). if you loose more than 2 disks, you are screwed... so, you need to back that up somewhere... 2: you need someone to manage that machine also... 3: ESXi, for what you would need here (8gb ram or so) is free, unless you want support....

i think in all fairness, that depending on the amount of storage you need or want, its swings and round-a-bouts... i like the idea, but i would also like the idea of having a box in house with a lot of storage (like the big dell) and only select some parts for off site backup... this is what i do... most of my stuff is stored locally (RAID 1, Thecus NAS, Drobo) and only important stuff (music and videos i bought, photos i took, etc) is backed up to the "cloud"...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: