Hacker News new | past | comments | ask | show | jobs | submit login
Low Cost EC2 Instances With Burstable Performance (amazon.com)
231 points by jeffbarr on July 1, 2014 | hide | past | favorite | 74 comments



Maybe I did something wrong with the setup but the disks (type gp2) are really slow compared with linode and digitalocean:

  ubuntu@aws:~$ dd bs=1M count=1024 if=/dev/zero of=test   conv=fdatasync
  1024+0 records in
  1024+0 records out
  1073741824 bytes (1.1 GB) copied, 23.1199 s, 46.4 MB/s
  ubuntu@aws:~$ sudo hdparm -tT /dev/disk/by-label/cloudimg-rootfs 

  /dev/disk/by-label/cloudimg-rootfs:
   Timing cached reads:   23292 MB in  1.99 seconds = 11704.62 MB/sec
   Timing buffered disk reads: 232 MB in  3.02 seconds =  76.70 MB/sec


Do you run this tests with EBS optimized? Only with that you get guaranteed bandwidth to your disk and without that network may be the bottleneck.


The t2 instances can't be EBS optimized.


If you're using EBS volumes, they have to be "warmed up" before you see their true performance. Even then they're variable. See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-prewa... for more details.


With I/O being the top bottleneck in modern databases and web applications, a lot of what people call "unpredictable performance" or "virtualization overhead" has mostly to do with sharing (and thin provisioning) storage devices between multiple servers/nodes.

I think there would be a huge market for a performance-oriented VPS provider who could provide each node with its own, dedicated hard drives/SSDs. All major virtualisation tech (KVM/Xen) already supports raw disk mode.

Obviously, space and SATA ports inside servers are an expensive commodity with off-the-shelf hardware, so this project would require at least some custom hardware to offer competitive pricing. I think the tiny mPCIE SSDs sometimes found in laptops would be a good area to explore.


> With I/O being the top bottleneck in modern databases and web applications, a lot of what people call "unpredictable performance" or "virtualization overhead" has mostly to do with sharing (and thin provisioning) storage devices between multiple servers/nodes.

The thing is, most of the problem with sharing a disk simply goes away once you go SSD.

Spinning disk has the characteristic that sequential access is pretty good... and random access is terrible.

If you have two processes both streaming sequentially off the same disk at the same time, if your scheduling algorithm doesn't allow for terrible latency, the disk is going to see random access. This is why spinning disk is so terrible for multi-user setups.

SSD doesn't have that problem. SSD has problems with writes that are not cell-aligned, but this is functionally similar to the problems raid5 has, and much like raid5, it's not a problem when it comes to reads.


EC2 is generally very expensive for CPU. RAM and storage are okay but CPU is crazy.

Anyone know what recommends EC2 over Digital Ocean, Vultr, Linode, etc.? Are they more reliable? Enterprise features? Network bandwidth? Cause right now they look hugely overpriced.

I've hosted on Digital Ocean and Vultr for some time and my uptime is great on both. I run constant ping testing and I do see little glitches from time to time between data centers, but that could be network weather on the global backbone. (I have a geo-distributed architecture so there's stuff running at five different locations.)


You can't look at EC2 as just a place to go for a VM. It's not worth using EC2 unless you either:

A) Already have a lot of other infrastructure on EC2.

B) Want to use the other services offered by AWS.

AWS is a collection of services and APIs that you can compose to build big, complex, scalable things. If you just need a VM host, AWS should be the last place you look.

On the flipside, if you have an application that could benefit from outsourcing some of your infrastructure, AWS could save you time and money. For example, we have AWS managing our DB server (RDS), Load Balancing (ELB), Redis/Memcache servers (ElastiCache), static media storage (S3), DNS (Route53), and some of our CDN capacity (CloudFront). We can manage all of these through the same API client (boto), and the inter-service latency is typically low.


Amazon offers a full ecosystem of services - compute, object storage, CDN, DNS, DbaaS and a lot more - DO and Linode are VPS services with hourly pricing.


I use Linode for production sites. It's easy to create private networks. Performance is better than DO in my testing. I use Digital Ocean for my hobby sites like blog and quick apps since they have inexpensive instances.


Could you please elaborate on your 'private network' setup on Linode?


EC2 is generally very expensive for CPU. RAM and storage are okay but CPU is crazy.

I'm leaning towards becoming an EC2 apologist on here, but just running quick benchmarks on a t2.micro versus both the $5 and $10 Droplets.

sysbench --test=cpu --cpu-max-prime=40000 run

$5 Droplet ("2.0Ghz", bogomips 4000)- 99.4981s

$10 Droplet ("2.4Ghz", bogomips 4800) - 88.3740s

(I can't find any actual documentation detailing the $10 option being faster, so perhaps this is just random luck on instantiation)

t2.micro ("2.5Ghz", bogomips 5000) - 69.5248s

Now of course the t2.micro won't let you run that around the clock, which for many workloads is entirely fine: as a standard blog host and the like, or the overwhelming majority of server implementations, bursty CPU is exactly what most natural workloads look like.

Add comparisons of the CPUINFO for each-

Both droplets (identical cpuinfo flags) -

flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl pni vmx cx16 popcnt hypervisor lahf_lm

Amazon t2.micro-

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms

Of particular relevance is that the Amazon instance (E5-2670) exposes SSE4 and AVX to your VM, which for many workloads could dramatically increase its advantage.

I guess the whole point of this is that the vague CPU terminology that the various cloud vendors use is seldom really comparable. However to your core question, Amazon becomes a value proposition when you are using all of the parts -- S3, load balances, elastic IPs, shared volumes, availability zones, security zones, VPCs, private networks...it is all multipliers to the value of the platform.


As a quick addition on this, the m3.medium -- running on the same processor but governed differently -- takes 160 seconds to run the same benchmark (after repeated runs).

Amazon used to promote their instances via the somewhat comparable ECU metrics. Now, however, unless I'm missing something, you need to try to determine by narrative, because 1 vCPU is very much not equal to 1 vCPU on other instance types.


They still use ECU, but I'm not sure it's comparable across generations of instances, e.g. an m3 with more ECU than an older m1 instance of similar size seems at times to be slower.


The value proposition of these instance types seems to be entirely focused on CPU burst performance, with no local storage, no EBS optimization (though there are provisioned IOPS), and only moderate network performance that is shared.

Relatively poor disk performance is somewhat expected. I'm not sure how fair it is to compare it to instance volumes on other platforms, given the significantly reduced flexibility that brings with it.


Any thoughts on how this compares to simply spinning up a standard instance for a few hours then turning it off when you don't need it?

I run a service that needs to run about 72 hours worth of processing each day, and it all needs to happen during a 3 hour window. That's a natural fit for spinning up a couple dozen instances then killing them when they finish.

I'd love to see a comparison of what would happen if I kept the same amount of compute power on standby 24/7 using this new instance type.


It seems like this fits two needs, for smaller companies and/or people just getting started with EC2.

1. Laziness. Which I don't necessarily mean in a pejorative sense. Maybe someone just doesn't have time, yet, to learn/configure/maintain spinning up an instance for limited times.

2. Single instance. To spin up an instance, you need another computer. If you want that "manager" computer to be an instance at EC2, too, now you need two instances. With this approach, you can set up just one instance and get much of the same economic benefit.

EDIT: Also...

3. Predictable cost. If your manual spun-up instance turns out to need to run for 4 hours instead of 2, you get a bigger bill. With the t2 instances, you'll get a slower compute (if you run out of "credits") but not a bigger bill.

Again, this probably appeals most to small/new customers?


> Single instance. To spin up an instance, you need another computer.

I think you can do this with cloudformation, having it respond to the size of a work queue, however:

> Maybe someone just doesn't have time, yet, to learn/configure/maintain spinning up an instance for limited times.

This is why I can't answer the question above for certain, I got about that far in documentation and went off to find a simpler solution (for me, tutum: https://www.tutum.co/ )


As far as #2 goes, you don't need another instance. You can do it with time-based autoscaling groups. This does present problems of its own, but ones that are not hard to solve.


Yeah exactly. I do think however a method where they automatically repurchase credits when it runs low such that it just ends up costing something like double a normal instance when it does run low.


Well a c3.xlarge is $0.21 per hour (their "compute optimized current generation cpu" line).

24 * $0.21 = $5.04 per hour

You could burn those for two hours, almost, to match the lowest $9.50 per month cost of what they're talking about in the blog.

The c3 approach would give you 96 vCPUs during that time. The t2 micro for $9.36 or whatever per month, gives you one vCPU. I'd have to strongly favor spinning up 24 to 48 instances of the c3 large and clocking the job in one to three hours if possible.


Burstable jobs? This is where Spot Requests shine. Current spot price for c3.xlarge is $0.032/hr

24 * $0.032 = $0.78 per hour

I run all my CI infrastructure from spots for dirty cheap. Sure, it could all be yanked out from under me, but it's been running non-stop for over a year now. Plus, it keeps you from making "special snowflake" instances that you shouldn't. A minute and Puppet/Chef have got you a splendid new instance. ;)


If you are running a service that needs to be up all the time this is ideal. For other scenarios, what you describe is probably appropriate.

With burstable instances, you accumulate 6 CPU credits every hour so you can run at 100% load for an hour, once per 10 hours for t2.medium (once every 13.3 hours for t2.medium; once every 15 hours for t2.micro)

It would be nice to have a credit window greater than 24 hours though.

EDIT: ColinCera pointed out the math is incorrect. Updated and removed erroneous conclusion.


That's not correct. The way you accumulate credits lets you run for a full hour at 100% CPU, once every 10 hours. So if you have a lightly loaded server most of the day, you can burst to 100% CPU for 2+ hours per day.


A t2.medium starts with enough credits for about 15 minutes of 2 core CPU saturation and accumulates 12 minutes/hour thereafter. In non-burst mode it is about 5X slower. For this type of workload you'd likely be better off with a c3 or m3. t2 is a better fit for long term usage with periodic spikes (20% or less of total operational time).


It keeps your IP stable, I imagine there are some scenarios where this is a plus.


So does an Elastic IP.


Super interesting. If I did the math right, a 3year heavy reserved t2.micro instance comes out to $4.48/mo, which is cost competitive with Digital Ocean. The proof will come in the benchmarks, but this may become my preferred hosting solution.


It's $77 for 1 year reserved if I'm reading it correctly. That's $6.44 per month for an instance with double the RAM of the DO $5 instance. The specs look like the size that DO is charging $10 for currently. For a 3 year reserved instance it's $4.48 a month for double the size of the DO $5 instance. There's also a free tier, so the first year is free to try it out.

DO was competitive with EC2 on price but not on features (and certainly not on security), now with the price advantage gone...

EDIT: corrected calculation


The price of an EC2 instance doesn't include data transfer though. For the $5 DO instance you get 1TB of traffic for free.

The price advantage is definitely not gone.


How many customers use more than 1GB of outbound traffic per month for a $5 server? Data transfer in is free on EC2 and the first 1GB outbound is free too according to the pricing page.


Isn't the first GB of your account traffic free with AWS vs DO giving you 1 TB per droplet?

While 1 EC2 instance may not use more then 1 GB (which is a very low quota unless your CDNing everything), if you have a couple of instances your almost certainly going over that.


$51 is only the up front fee. It's $77 for the year, plus you need an EBS volume - say $6 for a magnetic disk 10 GB EBS, or $12 for an SSD.


ok, so about $6.90 per month for 1 year then including storage. Thats still less than a 1GB RAM DO server which goes for $10 per month, and much less if you go for 3 years. Factoring in 1 year of free tier makes it even less.


It doesn't include storage Id doesn't include SSD storage It doesn't include provisioned IOPS for SSD Storage (well, ridiculously low) It doesn't include bandwidth IT doesn't include support.


The issue is that this offering is complex to understand as opposed to DO which is incredibly simple to understand. It is actually pretty funny how hard it is to understand this offering from AWS, it takes many paragraphs of reading to figure it out.


There's nothing to understand with DO because they don't tell you what their definition of a "CPU" is. For all I know it could be oversubscribed in a worse way than EC2.


Most people don't care the exact speed of the CPU for a server, as long as it's stable and doesn't lockup with little load, especially when purchasing a Micro instance. If CPU utilization is high then it's time to upgrade to another CPU or start another instance if possible.

It would be better to do real speed tests of each service to determine average "CPU" speed. I'm sure both services are constantly optimizing for both shared hardware usage and speed, so the stats would have to be updated regularly.


So if you don't care you shouldn't penalize Amazon for telling you how it works.


I typically get 97% of one core for my $5 instance, FWIW (or 100, depending on load of the box, I presume).


AWS is a simple model: pay for what you use. Most cheap hosts (i.e. please kill me DreamHost) give you no SLA and your rolling the uptime dice. Three years on DH and I wanted to kill myself. Three years on AWS and life is just splendid!

To understand your billing, you need to understand what you're consuming, which you always should. These credits add a little wrinkle, but also make the service cheaper and more deterministic. If you have credits, you'll get the CPU you bought with them.


I would replace 'super interesting' with 'Super complex'. It's an example of how you can make the price of a $10-40/mo server complex to the extent that you need to read the blog post numerous times before you understand the construct.

And even then, one still needs to factor in the 'other' costs like I/O or IOPs, disk (persistent/EBS), IPs, internet and inter-region data transfer… before you understand the real cost.

And then you need to compare to other instance types (which soon will cover the full alphabet -- c, cg, cr, g, h, i, m, r, t… ) and then other providers.

You still have several unresolved issue -

1. Are your assumption on usage (cpu, I/O, internet etc) correct? Will they change? 2. How do I compare performance across providers for a given VM specification. 3. Can I get support when I need it?

And I am sure there are others

It certainly means there is room for other players who just make it simple, whether they are infrastructure folk (like DO/Linode etc) or platform plays that make the pricing understandable by the audience they are trying to target (like Heroku/Ninefold)


That assumes that Digital Ocean doesn't improve their offering via a hardware upgrade or price discount in the next 3 years.


You can be pretty sure Amazon will do that too


The interesting thing about Amazon (vs the VPS market, where DigitalOcean, Linode, and I live) is that when amazon lowers prices, they lower prices for existing customers who don't make changes to their accounts. When a VPS provider like Linode lowers their prices, they usually charge existing customers the same amount, and simply give them more resources.

Just an observation. I'm not criticizing either way of doing things; obviously, lowering prices straight out is better for the customer, and keeping revenue stable while just upgrading hardware is better for the provider. Last time I lowered prices, I lowered prices directly, and just took the revenue hit. I'm planning my next upgrade now, and instead of lowering prices, I plan on giving everyone more ram/disk/ssd, while holding prices steady.

It is something I've thought about... the problem is that I'm going to have to go down by more than half, and it's way easier to lease enough hardware to more than double everyone's allocations than it is to double my customer base to make up for the lost revenue.


Perhaps I have selective memory, but I've never seen Linode lower prices, they just keep upping the specs on the lowest tier.

It reminds me of something I learned while working for Comcast years ago - never lower prices, just keep adding "value".


>Perhaps I have selective memory, but I've never seen Linode lower prices, they just keep upping the specs on the lowest tier.

Yes, exactly. I'm saying that is the standard way to do it in the VPS market, in part because until D.O. most of us were self-funding, and it's way easier to pay for double the compute resources than to deal with a 50% cut in revenue.

In the "cloud" market where amazon is, the standard way to do it is to directly lower prices.


AWS is a little bit a hybrid of both. If you're paying hour-to-hour, all cuts are immediate. But there is such a huge discount for reserved instances, that many large clients are using a large proportion of them. With a 3-year "heavy utilization" reserved instance, Amazon has gotten a significant % of the total price for running that instance up front, locked in for 3 years. Since the biggest part of the revenue (the reservation fee) is locked in, cutting the hourly rate only gives back a smallish part of the revenue to those kinds of clients.


Really? when they lower the price of the per-hour billing they don't lower the locked-in fees?

Huh. In the VPS market, from what I've seen, the rule is "treat your existing customers as well as your new customers"

while, say, the co-location market is like the real-estate market. "Subsidize your new customers, and if they are still alive when the lease is up, take profits in the form of much higher renewal rent."

I guess what you describe with pre-pays is sort of inbetween. There's a difference in most minds, I think, between raising a price and just not lowering it when you perhaps could be expected to. Most people new to the real-estate market feel pretty bent out of shape when they find out that they have to pay significantly more in rent to renew their existing contract than they will pay if they move.


As one of your customers, I've already committed $X to my instance. I would be delighted to get 2x resources compared to .5x cost.


Thanks. That's good feedback. We are working hard on the upgrades, but I have been way too slow :(

I do observe that there seems to be a price floor phenomena; for any customer, any price below $x is largely equivalent; they will go for the best thing they can get for $x, so providing a better product helps, but lowering the price below $x doesn't change the equation for that customer. Of course, $x is different for each person, so lowering your price does get you customers who had a lower value for $x.

I've already lost most of the customers that had a value for $x that was greater than what they were paying me at this point; I'm not losing customers nearly as quickly as I predicted. Right now, if I screw something up, of course, I lose the effected customers; I mean, it's really dramatic. You always lose some customers when you screw something up, but I lose way more now than when my prices were lower than the credible competition. But other than that, things have largely stabilized.


I should note that I'm a hobbyist/enthusiast type of customer; if you're losing business on price grounds, I can definitely see where a price cut would help.

(Also, it doesn't help that the wiki is crufty and out of date, and boot menu, last time I rebooted, was still on CentOS 5.)


> I should note that I'm a hobbyist/enthusiast type of customer; if you're losing business on price grounds, I can definitely see where a price cut would help.

I think similar principles govern business spending, only $x for them is usually higher. I have a couple of business co-lo customers who have been customers for like half a decade; some of them are still using the hardware they came in on. They could save a lot of money by upgrading hardware (and thus reducing their footprint) or even moving to "the cloud" at this point, because while co-locating modern hardware is cheaper than "The Cloud" - co-locating ancient hardware is not.

The idea is that it works for them, so they aren't going to fuck with it. I'd bet money, though, that if I fucked something up and caused them a serious outage, they'd be gone pretty quick.

>(Also, it doesn't help that the wiki is crufty and out of date, and boot menu, last time I rebooted, was still on CentOS 5.)

I just want to acknowledge those problems. We only have vague plans for the wiki, but we're actively working on upgrading the rescue image and the hypervisor (which, I imagine, is the part of the boot menu you are complaining about.) - these changes will probably not be implemented until our switchover to the new ganeti-based system, but... that should be soonish.


Be careful here. Despite announcing 42 price reductions over the 7 years of AWS existence, the M1 tier was only reduced for the first time in 3 years after Google/Azure dropped their pricing. They push through a huge number of price drops, but sometimes (actually often) their headline drop is one very small, unique charge, leaving everything else unaffected.


They will, but only for people who didn't lock in a reserved instance. Amazon doesn't typically upgrade instance types, but rather introduces new ones. So if you lock in a t2.micro instance for a 3-year reserved period, you are stuck with exactly those specs for 3 years. If they introduce a t3.* instance class next year, you don't get a free upgrade to the new specs.


A year or so ago, they introduced the ability to change the instance type between the same family type [1].

[1] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modify...


That increases the flexibility, but afaict doesn't help with the upgradeability angle. When Amazon passes along the technological dividend by creating new instances with a better price/performance profile, it's usually as new family types. So your reserved t2.* instance can swap with other t2.* instances, but won't be able to take advantage of any t3.* instance that's introduced in the future.

It's not a huge deal, just something to take into account when projecting out costs: the 3-year reserved instances are locking in today's prices until 2017, in return for a discount over today's non-reserved prices. Whether this produces long term gains requires some assumptions about how the market will change over the next 3 years.


Pricing plans of DigitalOcean and Linode included some free data transfer, while you will need to pay for data transfer on AWS.


And that's worth quite a lot, given Amazon's bandwidth is expensive. The 2tb in the Linode $10 account will run you about $200 with Amazon. Doubtful anyone is burning 2tb on a $10 account, and I'm sure Linode has that in mind, but even if you're using 200gb of bandwidth that would still cost you over $20.


Pricing plans for Amazon also include the best security and interconnects bar none, fastest access to the wide array of AWS and AWS hosted services and an ecosystem of tools and apps.

And of course Amazon has transparent disclosure for outages and security issues unlike say Linode.


don't forget that the network transfer from an aws server may be slower than from a linode/DO as well. Any benchmarks out there?


> The T2 instances use Hardware Virtualization (HVM) in order to get the best possible performance from the underlying CPU and you will need to use an HVM AMI.

I've always used paravirtual AMI's, as I understood that gets the best performance for a Linux box.

Given that I try to use the same self-baked base AMI's for various purposes (and instance sizes), I would either have to mix and match or switch everything to HVM. However, I have no clue what the practical consequences of that would be.

Can anybody enlighten me?


HVM gives the best performance because you can take advantage of certain hardware features through the hypervisor. It's basically more direct access to the hardware, which makes it faster as you don't have as much hypervisor overhead. Amazon's "enhanced" networking and SSDs need HVM to get a good chunk of performance.

Yes you'd have to build new AMIs with HVM. It'd be easiest if you had some kind of configuration management so you didn't need as many AMIs baked. When I build machines I use a script to handle the creation and mounting any extra volumes on a machine that I have as "nonstandard". I have only 2 custom AMIs - one for PV and the other for HVM. You'll need to have at least both, because if you wanted to use certain instances (t1.micro, m1.small come to mind) you can only use PV.


HVM vs PV is confusing because as Xen improves the performance characteristics of the two modes change. Brenden Gregg covered the differences in quite some detail in a recent blog post [1]. Basically, if you are running a new enough kernel on the guest os you will get better performance from HVM.

[1]: http://www.brendangregg.com/blog/2014-05-07/what-color-is-yo...


This seems really amazing. This workload pattern matches almost all of my small projects.


This looks like it's a reaction to (and effective solution for) the problem with t1 instances that made them largely useless (or a gamble at best) due to sharing a CPU with instances that run at full load all the time.


Also known as Amazon Droplets


This looks a nice way to experiment with CoreOS as it is not supported by DO or Linode but an AMI is available.


> This deceleration process takes place over the course of a 15 minute interval in order to provide a smooth and pleasant experience for your users.

The thrashing will increase gradually until user experience is pleasant.


Any recommendations for software builds? I usually go with c3.4xlarge for building Android platforms but wondering if there are alternatives out there.


I thought that cost was dominated by memory utilization instead of CPU utilization. How can AWS manage to pull this off?


That was my understanding too. At $9.50/mo, a server with 96GB of RAM would bring in $912/mo.

A quick click around dell finds that a mid-range 1U rackmount server (R320) with that much RAM costs $3,135.

So a back-of-the-envelope calculation makes it seem workable, especially for high-RAM low-CPU configurations, which is what this is.

There are other tricks that they might be employing, such as swapping out part of RAM to SSDs behind the scenes, as well as compressing RAM contents. On low-load servers like these, typical usage would imply that RAM would be mostly static.


These instances have almost no memory. A physical server has 16GB/core and it looks like Amazon is putting 10 1GB instances on each core.


What would be the recommendation for cloud provider with high IO throughput (as oppose to memory/ cpu throughput)?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: