The thing I learned that blew me away is that with their new networking stack, the cross-AZ communication within a region is "always <2ms latency and usually <1ms".
As mentioned in the slides, that's the same latency usually associated with SSDs and 100X better than the latency typically associated with cross-region networking. This suggests to me that running distributed databases in multiple AZ's has almost no latency penalty (e.g. you won't even paying eventual consistency / replication lag taxes that you might think are a danger). That's pretty damn cool.
The big takeaway from Hamilton's talk was actually something that dawned on me when they released the c3/m3/i2 instances which is what they have done with SR-IOV.
The software stack that implements the AWS VPC fast path must be running on the NIC itself. This is a big deal.
Being able to implement the require isolation/tunneling/encapsulation logic and routing in the NIC is a huge performance boost and drastically simplifies the hypervisor. Well, the software portion of the hypervisor atleast.
If you listened closely to the talk he laid out the differences in latency impact of each layer in the networking stack, from the fibre (nanoseconds) up to the software stack at some number of milliseconds, several orders of magnitude worse than anything else in the stack.
The reason why the quoted latency is so large is how para-virtualized networking is implemented. In order for it to be (bandwith) performant it needs to use large ring buffers with many segments. This is very throughput optimised and latency suffers a ton as a result.
By moving all of the queuing to the NIC you get a bunch of benefits, namely the dom0 (basically the host in Xen terminology) is no longer involved in pushing packets and you are not incuring the cost of the Linux networking stack 2x. In the paravirt model the skbs are transferred across the circular buffer into the host OS where they are injected into a virtual interface, thus traversing the full net stack again.
In the SR-IOV model the address space of the virtual NIC is mapped into the guest OS using Intel's IO-MMU extensions and the guest is then able to communicate directly with the NIC, thus 100% bare metal performance.
If SR-IOV was the only improvement it would be impressive, however it's the consequence of it's existence which makes the biggest difference. If the guest is talking directly to the NIC, then all of the encap/decap is HW accelerated too and in theory this means the full networking stack is end-to-end in HW.
Note: standard out-of-the-box SR-IOV allows VLAN tagging/stripping outside of the guest control, maybe they are simply using this in conjunction with a layer 3 switch to handle the VPC stuff?
I just watched that youtube video.. I had thought that VPC's were a clever combination of xvlan or grsec tunnels; I had no idea how custom it was. That's crazy in combination with sr-iov as you pointed out. I gather they are either running the VPC stuff on the card or what it's connected to.
Probably something closer to MPLS all things considered. If you think about it they effectively are a carrier network with many virtual customer networks on top.
MPLS has a bunch of advantages over VXLAN and is decidedly L3, which is more suited to AWS VPC. The real question however is what the control plane looks like.
They may just go pure carrier style and just do tagging and resource control on the NIC but it's entirely possible for the NIC to be doing much more, especially if they have a dynamic control plane similar to OpenFlow (though probably not exactly OpenFlow.. it's less useful than it sounds).
I would love to see a talk on AWS VPC internals rather than sitting around taking pot shots. :)
> This suggests to me that running distributed databases in multiple AZ's has almost no latency penalty (e.g. you won't even paying eventual consistency / replication lag taxes that you might think are a danger).
Indeed. It would be interesting to know the distance between two AZs, provided 2ms is the RTT. And considering many factors including - switching and network latency including encryption / decryption on cross AZ traffic (if they do)
> The above shows the US East region in Ashburn, Virginia. It has five availability zones, and these are protected areas that are isolated from each other by a couple of kilometers ...
> This is a course of action, Hamilton said laughing, where people “would get you a doctor and put you in a nice little room where you were safe and you can’t hurt anyone.”
This the first reference I've heard to scales of economy and "Blast Radius" concerns (I.E. How much damage occurs if a data center goes down - apparently Amazon feels that at around 80,000 (or so) servers, it makes more sense to build new data centers, than to make existing ones bigger.
This is why Availability zones have multiple data centers (as many as 6 (10?) in US-East)
Also, while I was aware that Amazon was looking at building their own network stack - I wasn't aware that they'd replaced all their Cisco/Juniper gear with white-label ODMs with their own custom software stack. Now that's a company that takes networking seriously.
bespoke network equipment and associated software stacks are what everything in the datacenter, and hopefully in the office/home, will be running in the next five years.
Rather than bespoke, hopefully commodity. We've already switched to commodity networking hardware, and will never go back where possible. (Currently big edge routers still need to be from proprietary vendors, I believe.)
I'm a longtime AWS user. Recently I decided to see how the competitor clouds offering stack up - because I've read a lot about how the second-movers in the space have caught up to Amazon. And that now there's basically no difference between the offerings.
I decided first to try Google - specifically Google App Engine. Just to see if I could get a plain vanilla base case working quickly. And my initial reaction was that AWS is still head-and-shoulders above everyone (or at least Google). The Google UI and setup process seemed ridiculously complicated and unfriendly. With AWS, I was up and running almost immediately. Not so with Google.
So I immediately ran back to AWS and dropped Google. I'm not sure if my bad experience with Google was because I had framed my expectations through my AWS experience and thus wasn't able to use Google the way it intended to be used. But it just seemed way too unfriendly. It seemed to require needless dev installs that should just be automated.
When I went back to AWS, just the sheer amount of services they offer seemed staggering by comparison. Google still has products in BETA that AWS has in mature mode.
I was going to try out other competitors like Azure, Digital Ocean, etc. but now feel like there's no need. AWS is just good.
I'm finding this testimonial a bit difficult to believe. App Engine is very obviously a different product from the whole AWS stable, and that's something that you should be aware of if you're in a position to be comparing them.
AWS is a high-quality, extensive offering, but it's not suitable for every situation. The 'sheer amount of services' are in some cases lacklustre reimplementations of services you could run yourself on EC2, for example.
What this boiled down to is "I wanted to compare things to AWS, so I had a half-assed look at something that really isn't a competitor then immediately stopped looking." That's not really convincing.
It's also missing a lot of important functionality if you are trying to implement anything other than the basic case.
A perfect example would be the difference between GCE networking and AWS VPC. GCE networks support routes with priorities and if you insert multiple routes into the routing table with equal priorities it does what you would expect, equal cost multi-pathing.
This makes it really easy to implement proper scalable NAT for private instances, which is just pure pain in AWS VPC.
There are many more examples of this and AWS is not the only culprit, both GCE and Azure have either missing features or mis-features that make me want to flip a desk sometimes.
Digital Ocean really shouldn't be compared to AWS for anything other than EC2. For that, Digital Ocean is pretty amazing, and at least as easy to use as AWS in my experience. The value proposition is far greater for Digital Ocean, if all you need is EC2. The 3tb of bandwidth you'll get on a $20 DO account is alone worth ~$300 on AWS. Linode is similar, in terms of value proposition and ease of use.
I really don't think GAE is comparable to AWS in that way. GAE offers "platform as a service," where the actual infrastructure (e.g. configuring actual servers, virtual or otherwise) is abstracted away from you as a customer. A more fair comparison would be Google Compute Engine, which is an "infrastructure as a service" product like AWS.
I've been using AWS for years, and I think it's great that it's getting to be mature. I'd be cautious about comparing cloud vendors by the number of services they offer though.
As cloud users and developers, we should be pushing for standardization between vendors. I dislike having a huge ecosystem of proprietary cloud products to learn before I can do anything useful with them.
I try to stick to IaaS offerings because it's easier to evade vendor lock-in that way. I'm happy to see Docker taking off, because it gives us a useful abstraction and platform that we can take with us across cloud vendors. (And isn't it interesting that AWS is rolling out their own Docker management platform? I think that's a good thing, but it's also true that AWS has an incentive to compete with more open systems like Kubernetes rather than adopt one.)
I came from sysadmin background and regularly have to compare AWS, Azure & GAE. My company is PHP shop which use only IaaS & PaaS.
Basically nothing beats AWS on IaaS and Azure seems to be the best for PaaS. I'm comparing in term of cost, feature and ease of implementation/maintainability.
Can you elaborate re. Google? Google App Engine is more comparable to Heroku, so as a "longtime AWS user" you might be more comfortable with Google Compute Engine.
But the surprising thing, even to Hamilton, was that network availability went up, not down.
I'm not sure why that was a surprise, KISS, applies just as well to networking as most other parts of the data center.
Networking, and storage (to a lesser extent) seem to have enterpriseits. By that, I mean the companies involved in it do everything in their power to maintain their high margins. One of the ways they do this is adding features that everyone just has to have (or so sales will tell you). Even when many of those features are things that should _NEVER_ be actually used in practice, due to their affects on performance or reliability (take WAN accelerators for example, otherwise known as how to make your network appear faster 90% of the time and completely non functional for the remaining).
What I really wish is that the amazon's, microsofts, googles, facebooks, etc would get together and actually release these switches/etc they are building. While I imagine they never will because they view it as a competitive advantage. It sure would be nice to have some of this stuff available for medium sized data centers without having to spend $$$$$ with the established vendors just to get something that can do 100Gbit.
Facebook has released designs for everything from servers and racks to entire data centers as part of the Open Compute Project (which has many more member organizations than just Facebook): http://www.opencompute.org/
Great article. Amazon is able to produce a more reliable datacenter by creating network gear specifically for their use case. By focusing on just want they need, they can get rid of all the bloat, complications, and expense in general network systems. The more simple the system, the more reliable. Another reason general computing will all shift to the cloud.
A fantastic overview, ranging from regions (the number of them and how they're organized) all the way down to their new latency reducing network interface card.
In Ireland, Amazon bought a Tesco distribution centre [0] for AWS. Tesco vacated it for another building [1] which has one of the largest floor spaces in Europe.
I wonder how much of this custom tweaking ends up back in the Linux community due to GPL? If I am interacting with a GPL hypervisor, can I request sources for it for instance? I mean AWS is a huge Linux success story, does anyone know if Amazon gives anything back?
I think Amazon is just riding the tiger here. One thing to remember is back in the day they were massively over provisioning just to support their peak loads during Thanksgiving and Christmas. All that spare capacity became AWS. I bet their utilization rates aren't any different today ...quite possibly worse.
> massively over provisioning just to support their peak loads during Thanksgiving and Christmas.
Not really true. As part of an annual capacity planning exercise each team was required to plan and scale for holiday peaks. Infrastructure is not "free" and each team has to optimize for good performance at an affordable cost.
> All that spare capacity became AWS.
Not true, never was. I regularly reviewed and provided feedback on the original narrative document for AWS. While I don't have a copy handy, I am absolutely certain that the document was focused on providing infrastructure services to developers.
> I bet their utilization rates aren't any different today ...quite possibly worse.
I don't have access to those numbers, and have no permission to share them even if I did. Your thought model for utilization needs to take the EC2 Spot Market in to account. Savvy users of EC2 have learned to optimize their large-scale compute jobs and their bidding process to gain access to what would otherwise be (to your point) underutilized capacity.
I am sure they don't release numbers on how many "clever users" they have too.
The way I think about it is - can utilization rate grow at the same rate or higher than machine count at their data centers and for how long?
With all the levels of virtualization available and their market leadership today that curve can look quite magical I accept. But for how long? Seems quite a shaky curve to be betting 5 million machines on.
The excess capacity story is a myth. It was never a matter
of selling excess capacity, actually within 2 months after
launch AWS would have already burned through the excess
Amazon.com capacity. Amazon Web Services was always
considered a business by itself, with the expectation that
it could even grow as big as the Amazon.com retail operation.
As mentioned in the slides, that's the same latency usually associated with SSDs and 100X better than the latency typically associated with cross-region networking. This suggests to me that running distributed databases in multiple AZ's has almost no latency penalty (e.g. you won't even paying eventual consistency / replication lag taxes that you might think are a danger). That's pretty damn cool.