A newbie’s guide to scaling AWS

blantonl · on Dec 31, 2011

As a large scale AWS user, I'm not sure I can agree with any of these recommendations.

Don’t use micro instances for real work

There are some use cases where micro instances work in production environments (such as MongoDb arbitrators etc) but running anything in production on AWS on anything less than a m1.large instance is setting you up for failure.

Micro and small instances can be used for a personal Web site or tiny blog, but nothing else. When I visually think of a micro or small instance, I envision a PC sitting under a desk performing operations that a server in a rack should be hosting.

Have servers on standby. Configure a large instance, stick it in an Elastic Load Balancer with your other web servers, and just stop it.... ()

I really don't consider one server stopped, on standby, as scaling AWS. In the cases of Web servers, you might need to provision 5-10 Web servers, and having a single stopped instance is not going to allow you to accomplish that.

The better approach would be to have a server image prepared that can auto-provision itself (multiple times when needed) into the role defined through user-data passed to the image.

Scaling Web servers is an n+1 operation, not a +1 operation.

Seriously, use RDS

Seriously, if you are doing any real MySQL work on AWS, DO NOT use RDS. The reality is that AWS doesn't nearly do the EBS tuning that you yourself can do (RAID-0, kernel tuning etc) on your own EBS based MySQL implementation. See:

Getting Good IO from Amazon's EBS:

http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs...

Running MySQL on AWS:

http://aws.amazon.com/articles/1663

Finally, for MySQL slaves, use RAID-0 over the ephemeral disks which have dramatically better performance than EBS.

chaz · on Dec 31, 2011

Good advice. But I think the context here is that there's a sudden spike of traffic from unexpected popularity that's likely to die out within a day or two. The mission is simply to survive that period.

If you're on Amazon EC2 or similar, it becomes phenomenally cheap to throw hardware at the problem. Running a single medium instance is $0.175/hour. If you need need to go from 5 servers to 50 servers for two days, it's just $420 to get out of a 48 hour jam. Need to upgrade to the beefiest 68 GB, 28 ECU RDS instance while your'e at it? $125 for 48 hours.

Performance tuning for 100x load that may never come again is usually hard to justify, so just know what your outs are if you're lucky enough to be swamped with traffic.

However, just be sure that you know how to scale out quickly and you can launch new instances (hopefully automatically). On most days, our eCommerce business gets by on just a couple of servers, but we have our autoscaling rules in place that will spin up/down servers within a few minutes.

Be sure to test regularly. Having a server boot up and add itself to the load balancer with old code/configs can wreak havoc on your database.

typicalrunt · on Dec 31, 2011

Configure a large instance, stick it in an Elastic Load Balancer with your other web servers, and just stop it.

I agree with allowing AWS to auto-provision N new instances when needed.

But from the article and with my financial hat on, when the large instance is stopped and left in a standby role, are you still paying for it?

bradly · on Dec 31, 2011

You do not pay for an EC2 instance that is stopped. If the instance has attached EBS volumes you will continue to pay for the storage costs, but that is pretty minimal.

It is important to remember that a stopped instanced is not guaranteed to be able to be started again. If you have a m1.large stopped, there is a chance that Amazon will not have any m1.larges available in that zone/region when you try to start back up your instance.

Also know that the private dns name will change when a stopped instance is started again, so anything configured to point at that particular instance will have to be reconfigured.

blantonl · on Dec 31, 2011

But from the article and with my financial hat on, when the large instance is stopped and left in a standby role, are you still paying for it?

You aren't paying for the CPU time, only the S3/EBS storage space that the image occupies (which is negligible for any normal single instance/image)

bermanoid · on Dec 31, 2011

Agreed on all the scaling stuff - if you really serve traffic, m1.large should be your go-to instance size unless you're sure you know better, and being "ready to scale" means that if you need to serve 2x, 20x, or 100x the traffic, you can have 2, 20, or 100 servers booting up and self-configuring (including adding themselves to the load balancer) at the click of a button. Not that you have a few servers sitting around on standby waiting to be enabled...

There's one exception: if you've got "internal" servers that are only serving requests from you, or running occasional cron jobs (for monitoring and stuff like that), and you're sure that user traffic changes will never affect them, by all means go for a micro or a small. Those things aren't part of your publicly visible stack, so there's no sense wasting money.

Seriously, if you are doing any real MySQL work on AWS, DO NOT use RDS. The reality is that AWS doesn't nearly do the EBS tuning that you yourself can do (RAID-0, kernel tuning etc) on your own EBS based MySQL implementation.

Do you have any numbers on the performance difference between a well tuned EBS-based implementation and RDS?

Because speaking from experience, EBS based MySQL machines are always, without fail, the primary maintenance headache in any stack. They're always the machines that wake engineers up at 4 AM because of some EBS crappery or somesuch nonsense that necessitates intervention, and the promotion/backup/restore procedures can be touchy and error-prone. That's not even to mention the work involved in setting them up, testing, and getting them tuned right, which are serious stuff compared to the "pick a name, size, version, and launch!" process that RDS provides.

Most of what I could find seemed to suggest that the performance differences between RDS and well tuned EBS MySQL servers of comparable size were relatively small. For instance, from http://www.mysqlperformanceblog.com/2011/08/04/mysql-perform...: "My benchmarks generally showed that I was not able to outstrip RDS 5.1′s performance with the combination of stock MySQL 5.1, EC2, and a 10-volume EBS RAID."

If you think that's wrong and the difference is really much higher, I'd be interested to know.

Part of me is skeptical that any serious differences would persist for very long: if it's so easy to do this "right" so that you end up with a EBS+EC2 MySQL server that significantly outperforms RDS for general-purpose workloads, why wouldn't the RDS team set up RDS servers that way behind the scenes? I would think that Amazon should know how to efficiently utilize its own EBS volumes, right?

prib · on Dec 31, 2011

I can attest to the unreliability of AWS Micro instances for production environments. Two examples:

Ex 1: We had a request triggering an asynchronous job that would take 3-4 seconds of processor time. Several times, this job was being throttled, so even in a low traffic situation each time one of those requests came, a whole micro instance was consumed on it making it unavailable for anything else. Switching to small instances solved the problem.

Ex 2: When launching a new load balancer, due to some sort of wrong DNS/elastic-ip configuration, it's not uncommon to be hit hard by traffic coming for the last owner of the given ip. We've seen this happen several times in the past. Again, switching from 10+ micro instances to 2 small instances cleared the problem.

There's an interesting post by Greg Wilson addressing the throttling problem better than I could: http://gregsramblings.com/2011/02/07/amazon-ec2-micro-instan...

Terretta · on Jan 1, 2012

Thank you for the Greg Wilson link.

Brajeshwar · on Dec 31, 2011

I've been using a Micro Instance for almost a year (the free tier) to run http://nsfw.in/ successfully. And the traffic on the site is also not bad at all, with 100,000+ pageviews a month on an average. You've to know the right thing you need for your requirement and tweak it.

yummybear · on Dec 31, 2011

One of our problems in scaling is not starting new instances, but rather to ensure that new instances have the latest version of the platform.

Also, we are running a fantasy soccer game with trading windows, so the server load near the trading window closure time (the last hour or so) is orders of magnitude higher than average load. Ideally we would scale up when the load demands it, but with the platform versioning issue unresolved we need to run our platform at "full scale" almost constantly (or scale manually which is a really bad idea when someone forgets to scale back up again for the weekend).

We're running in a WISC setup.

mnutt · on Dec 31, 2011

What if every time you did a deploy, afterwards you kicked off a job that wrote a snapshot and brought up instances based off that. Then at any point you'd only ever be at most a deploy behind.

jaredstenquist · on Dec 31, 2011

Yes RDS, is great for the money considering the ease of backups and replicas, which normally are a pain in the ass for any sysadmin.

Overall this article has little to do with scaling on AWS. Running any sort of production website on a micro instance is pretty foolish, besides a blog or static website perhaps.

mrmagooey · on Dec 31, 2011

I think I may have had less idea about what your product does after clicking "See Revisu in Action", three cat pictures does not a demo make.

maxticket · on Dec 31, 2011

Yeah, we're working on that. Hopefully the upcoming video helps a bit more. It's just one designer and one developer, so we gotta triage some things while we build the product.

k33n · on Dec 31, 2011

This should be called "A guide to scaling AWS written by a newbie".

bradhe · on Dec 31, 2011

zing!

ultimatewarrior · on Dec 31, 2011

Your scaling AWS and you're a newb? You must be terrified!

njames029 · on Dec 30, 2011

Revisu is pretty sweet, great blog as well.

christianreed · on Dec 30, 2011

Oh man, Fake Grimlock is the man! Er... Giant bird robot.