New Amazon EC2 Instance Type - The Cluster Compute Instance

jread · on July 13, 2010

This new cluster compute instance size is essentially a variation of the m2.4xlarge instance (2xX5550 Nehelem - 27ECUs) with the following changes: 1. Faster X5570 CPUs 2. 10G Network 3. HVM

I think the most noteworthy feature is the 10G Nic which is more conducive to clustering than the standard GigE Nic for other instance sizes. I'm not aware of any other public IaaS provider offering 10G network currently.

In terms of raw performance the x5570 will probably perform very well relative to other IaaS providers. EC2 is one of the few IaaS clouds that utilizes a heterogenous hardware environment to enable better scalability between instance sizes. Other providers like Rackspace Cloud use homogenous environments (Opteron 2374 in the case of Rackspace) and rely on Hypervisor CPU throttling to enable CPU scaling between different instance sizes. On the low end homogenous clouds tend to offer better value but are not well suited for HPC application on the high end (e.g. a 2GB Rackspace Cloud server kills an EC2 m1.small, but the largest 16GB instance lags far behind EC2 m2.* instances). In fact in some homogenous clouds CPU performance is fairly constant from small to larger sized instances (you are essentially just paying for more memory).

I recently wrote a blog post comparing CPU performance of 20 different IaaS providers including EC2 and Rackspace Cloud. In it I used a compilation of 19 different CPU related benchmarks to produce an ECU metric for about 150 different cloud server configurations:

http://blog.cloudharmony.com/2010/05/what-is-ecu-cpu-benchma...

neilc · on July 13, 2010

I think the most noteworthy feature is the 10G Nic

The NIC is not the most interesting part, but the network is: not only do these new instances offer 10 GigE, but they also offer full bisection bandwidth to all instances in a placement group (=> no oversubscription of core switches). That is pretty unusual, and a big win for network-intensive applications.

jedbrown · on July 13, 2010

Penguin Computing's on-demand service offers both 10 GbE and InfiniBand.

petercooper · on July 13, 2010

Each Cluster Compute Instance consists of a pair of quad-core Intel "Nehalem" X5570 processors with a total of 33.5 ECU (EC2 Compute Units)

Aha, we finally have another EC2 Compute Unit comparison than the vague "1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor."

The X5570 is basically a server grade 2.93GHz Core i7 and it equates to 16.75 ECU (since the "pair" is 33.5 ECU). You can then do very vague and imprecise extrapolation and assume a regular 2.66-2.8GHz i7 is in the 12-15 ECU area (which makes most EC2 instances look rather anemic and probably explains why the Redis benchmarks on EC2 seem so poor.)

ds · on July 13, 2010

Yeah man, Ive been telling people for a while that ec2's offerings are high priced and low powered. Never would have thought a single i7 would be more than 12x the power of a ec2 standard unit though. crazy

mtw · on July 13, 2010

if you're hosting a website, makes more sense to rent dedicated servers. ec2 is great for off-loading peak traffic though

zemaj · on July 13, 2010

I don't agree. I've ditched almost all my dedicated servers across all kinds of apps that I run. I now use Rackspace cloud servers as my universal tool for everything I can.

I used to use EC2, but I've found I get significantly better performance (and of course support) at Rackspace for the same price. Also avoid Rackspace cloud sites, performance is pretty awful.

Anyway back to the dedicated server point. If you use virtual servers you can very easily replicate images, spin up copies and scale your computer power on the fly. Case in point, I was running one of my apps on a $10 pm instance on Rackspace. We got a Techcrunch article out of nowhere and the site instantly died under the traffic. I resized the server on the fly and boom, * straight back up. Once the peak died down I scaled the server back down. A couple of months down the track we get near Techcrunch traffic every day and I haven't had to scale the code base at all - I just keep it running on a larger instance. Much, much more cost effective.

* ok not straight. Unfortunately when resizing a Rackspace cloud instance under heavy load it takes ages to run (30+ min isn't unusual compared with 5 min under light load). I've found that the best solution is to spin up a copy on another instance and change the dns to point to the new one. I wish Rackspace would add some way to temporarily block traffic to a server to it can scale. Some kind of "panic", scale me up, button would be great.

zmarty · on July 13, 2010

You actually helped make his point.

zemaj · on July 14, 2010

Not at all. If you start with a dedicated server you can't offload peak traffic as easily. Always going with a virtual server allows you to scale on the fly and cuts costs when not in use.

alnayyir · on July 13, 2010

This observation explains 99% of political debates. Everybody is making points that don't necessarily support their own anecdotal biases, but they keep charging ahead anyway.

RK · on July 13, 2010

My research group has been doing HPC type stuff on EC2. In fact the press release copy for this sounds almost exactly like what we've been writing when trawling for cash. Of course Amazon also gave us a grant (in AWS credits), so maybe we shouldn't feel bad if any of it was directly lifted...

thu · on July 13, 2010

With 880 Cluster Compute Instances (7040 cores):

> This result places us at position 146 on the Top500 list of supercomputers.

And $1408 per hour.

mullr · on July 13, 2010

I wonder how that compares to the cost of booking time on a system of similar class, assuming said systems can be booked?

cl3m · on July 13, 2010

> (1 759.00 / 41.82) * 880 * 1.60 = 59222.1903

And for almost 60k per hour, you will be number 1! How long till AWS as the capacity to top the chart?

c1sc0 · on July 13, 2010

If it's a one-off computation then a few thousand for a relatively hassle-free setup is bearable for many use cases. I've seen many clusters running idle just because customers initially bought more metal than they could handle. Peak load != continuous load.

tomjen3 · on July 13, 2010

Considering how much you would have to pay for the power alone, that doesn't seem unfair.

usaar333 · on July 13, 2010

Using extra large high cpu instances is actually cheaper per compute unit. You just lose io and memory.

garyrichardson · on July 13, 2010

Exactly. All that compute power may not be useful if you need the io/memory.

gmosx · on July 13, 2010

hmm.. AWS is evolving relentlessly, GAE engineers should work harder (and faster) to catch up... It will be an interesting battle (and let's not forget about Microsoft, or even Apple)

tszming · on July 13, 2010

It is interesting that the new Cluster Compute Instances use HVM instead of paravirtualization, does it mean AWS is moving away from Xen?

cperciva · on July 13, 2010

Xen supports both PV and HVM.

sadiq · on July 13, 2010

Does this mean they are running Windows under paravirtualisation?

frisco · on July 13, 2010

The post is signed quite ambiguously "Jeff". Given that it's Amazon making a fairly large announcement here, they could do to include a last name there...

jeffbarr · on July 13, 2010

That's me...

tszming · on July 13, 2010

Hi Jeff, can you say something why Cluster Compute Instances use HVM instead of paravirtualization?

tszming · on July 13, 2010

Jeff Barr http://www.jeff-barr.com/

ericb · on July 13, 2010

With so many different options, it makes me wonder if they could do a choose-your-own setup and pay by the core/memory/disk you select.

helium · on July 13, 2010

Any good ideas what you could use this for?

mtw · on July 13, 2010

crunching numbers (such as estimating where the oil spill is expanding, weather prediction, etc.)