AWS is slowly becoming the Oracle of our generation, in the sense that they have found a way to lock startups and large companies into a software/services ecosystem that is really really hard to stop using once you get started.
You start with regular open-source instances, but that's just the hook. Once you have EC2, it's really easy to get started with AWS 'magic' services like Elasticache and RDS. It's easier than setting up a memcache cluster or mysql right? But once you get comfortable with those services, it's just so easy to keep going down that road and making your software reliant on proprietary services like SimpleDB, S3 and AWS Data Pipeline. And then you wake up at some point and find that you're 100% dependent on AWS.
By that point, if you're lucky your monthly AWS bill gets you an invite to speak at the next AWS conference. :-) You might even get a personal customer support rep that calls you when your servers go down.
A website/service cannot by definition be HA if it's reliant on one service or infrastructure provider. AWS has so many proprietary parts now that you really need to be careful which ones to use so that you don't wake up one day and realize that you're completely dependent on AWS.
I'd stay away from this with a 30-foot pole, but if we really did need to use it, I would only use the features that I felt comfortable building internally at some future point if we chose to move off of AWS.
It's important to keep your software stack as flexible and open as possible, and for risk-management you should plan on using (or least having the option of using) multiple vendors and service providers.
The thing is, though, even though it's in Amazon's interests to create dependence on AWS, it's also in their customer's interests to use those services.
When you double down on a rich platform you can get enormous advantages. Avoiding the inner platform is a biggie; not paying portability tax is another.
The urge to be independent of any vendor, any platform etc is attractive to us as engineers. But it comes at a high price too.
"A website/service cannot by definition be HA if it's reliant on one service or infrastructure provider." you seem to be conflating highly available with a diverse supply chain. A lot of highly available systems are "locked" in to one provider, whether it's broadcom/citrix/intel/etc.
Actually, yeah. How about worldwide nxos crashes due to the leap second bug? Or the various poison bgp updates that've made the rounds? Or overrunning an ospf domain? Anyways my point, if you read the sentance after my quote, was there's a distinction between sole source provider and "ha". Multi source supply is due diligence. But it's not a perquisite for or solution to high availability systems.
Just this week I was looking for a better solution that would back up my RDS database to S3. I'm currently using mysqldump, but the RDS instance size has grown extremely large and so, it has become unwieldly. Hopefully this will help with that.
It might not be appropriate for you, but a good way to handle MySQL backups is to maintain a mirror. This has the added benefit of being available as a fail-over and as a secondary instance where you can run reports or test long-running queries on current data without the risk of taking prod down.
Right. But the grandparent comment was suggestive of the possibility that he or she wanted the mirror to fulfil multiple roles, including being the backup.
Well, with a functional mirror to run manual queries against, you wouldn't be running the risk of running such a query on your prod DB anyway. But yeah, you still need a dump.
Thanks, I've already tried that. My main issue is EBS performance when writing the dump file to disk. The backups themselves don't impact on database performace much, but writing up to 20 Gigs of a dump file to an EBS disk on a nightly basis is extremely slow. Maybe this Data Pipeline service will help bypass that.
Have you tried piping the dump directly to a compression tool? We use pbzip2 for our dumps which works great if you have some CPU power to spare. The largest was around 8 Gigs uncompressed, but the total size of the plain text dumps is over 20 Gigs. Hardly an issue for EBS that way. Did kill the DB for a few minutes several times before using cstream to cap the dump bw.
You shouldn't really be trusting Amazon with your datawarehouse or paying that much for the storage, but from a technical convenience standpoint AWS is probably the best solution for some of the horrid little inept kinds of organizations that I have encountered.
Totally. I know I create lots of business value when I spend a day dicking around with mysqldump and rsync and inotify and scp and hfds. Who would want to use this kind janitorial service when the could do it themselves?
You start with regular open-source instances, but that's just the hook. Once you have EC2, it's really easy to get started with AWS 'magic' services like Elasticache and RDS. It's easier than setting up a memcache cluster or mysql right? But once you get comfortable with those services, it's just so easy to keep going down that road and making your software reliant on proprietary services like SimpleDB, S3 and AWS Data Pipeline. And then you wake up at some point and find that you're 100% dependent on AWS.
By that point, if you're lucky your monthly AWS bill gets you an invite to speak at the next AWS conference. :-) You might even get a personal customer support rep that calls you when your servers go down.
A website/service cannot by definition be HA if it's reliant on one service or infrastructure provider. AWS has so many proprietary parts now that you really need to be careful which ones to use so that you don't wake up one day and realize that you're completely dependent on AWS.
I'd stay away from this with a 30-foot pole, but if we really did need to use it, I would only use the features that I felt comfortable building internally at some future point if we chose to move off of AWS.
It's important to keep your software stack as flexible and open as possible, and for risk-management you should plan on using (or least having the option of using) multiple vendors and service providers.