Multi-Region Latency Based Routing now Available for AWS

ryanbales · on March 22, 2012

I am utterly floored by the frequency at which new features are being released for AWS... Are you Amazon people getting any sleep?

jeffbarr · on March 22, 2012

Five hours does it for me. And I have to write all of those posts!

As one of my colleagues mentioned, we are hiring (currently 516 open jobs on the AWS team). Here's a list that's easier to read:

http://awsmedia.s3.amazonaws.com/jobs/all_aws_jobs_list.html

jedberg · on March 22, 2012

And if you want to consume the fruits of their labor, Netflix is hiring too! http://jobs.netflix.com/

(Sorry Jeff, I couldn't resist)

werner · on March 22, 2012

And if you are looking for really cool jobs in which you can apply all your distributed systems and cloud skills on amazingly interesting products, beyond just video streaming, check out Amazon.com (75+ pages of 20 jobs per page in software development alone)...

http://www.amazon.com/gp/jobs/ref=j_sq_btn?keywords=&cat...

(sorry Jeremy, coudln't resist) :-)

glenngillen · on March 22, 2012

Well given we're doing this here now for anyone that takes advantage of AWS... ;)

I'm looking for engineers and UX people to come help me make the Heroku Add-ons platform even more amazing. If putting these cloud services into the hands of developers and changing the way people think about provisioning these services sounds like something you'd like to be part of: glenn at heroku dot com

lukevenediger · on March 22, 2012

What's it worth to do great work if you can't live in a great city? Amazon's Development Center in Cape Town, South Africa is situated in the heart of the Mother City, and is surrounded by the Atlantic Ocean and Table Mountain - you can't get better views. Combine this with never-ending beaches and sunny weather and you have the perfect work and play environment.

We build software for AWS and are looking for all sorts of engineers: from kernel development to building great web front-ends for our customers. Check out http://www.amazon.co.za/ for more info.

pzb · on March 22, 2012

Yes, we sleep.

Come join us and be part of the fun that is AWS: http://aws.amazon.com/careers/

If you can name it, we are probably hiring for it.

InclinedPlane · on March 22, 2012

They've been growing and hiring like crazy for the last 2 years or so.

gibybo · on March 22, 2012

So I'm a little fuzzy on how DNS works and this seems like a good place to ask:

I was under the impression that end users typically talked to DNS cache servers rather than directly to the authoritative servers in the domain's registration. If that's true, how can AWS provide different records based on the requesting user?

If end users are talking directly to one of the ~4 authoritative name servers listed in the registration, how does that scale to billions of queries?

mcpherrinm · on March 22, 2012

The most important thing going on here is Anycast. See http://en.wikipedia.org/wiki/Anycast for a complete description, but basically, multiple machines have the same IP address, and different routes are advertised to different networks. This results in packets being routed to the nearest DNS server.

Then, that DNS server can decide where the other end is and provide them with the correct IP.

You're right that most users use a caching DNS server, so it is actually the location of the DNS caching server. Of course, since most users choose a DNS server close to them (usually, their ISP), this should still result in a correct approximation. If you're using, say, 8.8.8.8 (google public dns), that's (probably) more than 1 server -- you're using the closest one to you. So then AWS will provide you with the closest AWS region to the closest Google DNS server, which is hopefully the closest one to you.

isb · on March 22, 2012

As mcpherrinm described, anycast plays a big part in how queries are routed to a DNS server in Route 53's global fleet.

You are right that a DNS server sees the query coming from the resolver instead of the user. So how does it pick the region closest to the user? As the blog post describes, we measure latencies from client networks to AWS regions and we also have a mapping of which resolvers are used by which client networks. If you put both of them together, you can compute which region is closest to the users of the resolver.

nl · on March 22, 2012

we also have a mapping of which resolvers are used by which client networks

How did you build this map?

I can think of a few complicated ways to go about it, but I'm wondering if there is something easy I'm missing.

colmmacc · on March 22, 2012

Take a look at http://whatsmyresolver.stdlib.net/ for one simple way.

Full-disclosure: I work on Amazon Route 53, and although we don't quite use that same method - it will give you an idea of what's possible. PS; we're hiring.

nl · on March 22, 2012

How does it work?

1) I visit the site, it gets my IP address

2) Magic happens

3) It displays my nameserver address.

What is going on in step 2?

Edit: worked it out.

For those interested, it uses a Javascript include from a unique subdomain name. Because the subdomain is unique the app can work out the relationship between client IP and resolver.

colmmacc · on March 22, 2012

The html page at http://whatsmyresolver.stdlib.net/ loads a javascript script from the url http://whatsmyresolver.stdlib.net/resolver/.

Fetching http://whatsmyresolver.stdlib.net/resolver/ triggers a 302 redirect to a url of the form;

  http://$guid.nonce.stdlib.net/resolver/

The DNS server authoritative for nonce.stdlib.net has a simple wildcard configured, so *.nonce.stdlib.net all resolve to the same web-server. Obviously the DNS request for the globally unique id domain name has to come before any HTTP request to the guid url, so when the DNS request comes in the authoritative server can record it in a simple lookup store (guid -> resolver source ip).

Then, when the HTTP request makes it to the web-server, it can inspect the Host: header to determine what the guid was. It then uses this guid to correlate the HTTP request it is handling and the resolver source ip, and generates some javascript with the data we need;

  var resolver="192.0.2.53";var edns=true;

It's just a hack I wrote up for my own reasons years ago. But if you'd like to avail of it for any reason (ie helping end-users debug things), feel free to embed;

  <script language="javascript" src="http://whatsmyresolver.stdlib.net/resolver/"/>

and use the variables it populates. No warrantees or guarantee implied :-)

aristus · on March 22, 2012

We use a similar method at Facebook.

http://www.facebook.com/note.php?note_id=10150212498738920

If I had to guess, I'd say that since route 53 is a dns host for many domains, they might be able to work out the user ip / resolvers map passively. Pretty awesome stuff, Amazon! This is a big deal for your customers.

donavanm · on March 22, 2012

> The most important thing going on here is Anycast. ... This results in packets being routed to the nearest DNS server.

Nearest in terms of AS path. This can be wildly different than latency as BGP isn't really implemented with cost/performance in mind. Anyone relying on anycast to answer their latency story is doing a lazy/poor/ignorant job of it.

> You're right that most users use a caching DNS server, so it is actually the location of the DNS caching server.

And another case where end users would be better served by providers implementing edns client-subnet extensions. This would allow savvy content providers to route to the actual end user, and not their resolver.

nl · on March 22, 2012

Anycast is good in the absence of better options, but I don't think this is Anycast.

It sounds like here Amazon is doing unicast DNS, but dynamically changing the response depending on the IP address the query comes from.

Since they have a big database of IP addresses and latency, they can choose the best response.

The missing thing here is that they don't get the IP address of the client computer, only the DNS server. In many cases that (along with geo-location of the DNS Server) might be enough, but it is possible to do better. I'd speculate what they do here is correlate the IP addresses with the DNS server either by subnet or by using active measurement techniques.

electrum · on March 22, 2012

The caching servers are typically close to the user: at your ISP, in your office's server closet, in your data center, etc. In the case of public recursive DNS providers such as OpenDNS and Google Public DNS, this requires a recent extension to DNS: http://arstechnica.com/telecom/news/2011/08/opendns-and-goog...

kogir · on March 22, 2012

Amazon is such a tease. Just give us Anycast Elastic IPs that work in multiple regions already!

There's still no good high-availability story for region-wide outages, and there won't be until they do this.

mrkurt · on March 22, 2012

Anycast IPs would be lovely, but they're tweaky enough to use that I suspect the support burden would be high (since they suck for long lived connections). And they're not nearly as easy to provision as a complicated DNS setup.

They need to just stop having region wide outages...

jscott11 · on March 22, 2012

Seems unlikely. Not every AWS user is in every region. They would need to have a separate range of anycast IPs for every possible combination of regions (2^N possibilities) to prevent packets from going into a black hole. If they wanted to be able to serve a conservative number of hosts in every region (say 32768) that would require 15-bits of IP space per combination. If AWS expands to 10 regions, that means 25-bits of the IPv4 space. A /8 only gives you 24.

I guess an alternative might be to do some kind of NAT for unbound anycast addresses which forwards packets to an available region, but that is hugely complicated.

kogir · on March 22, 2012

I'd be delighted if they offered it for even just two regions. Right now if you want your service to be highly available you either run your own routers in two collocation facilities and forward to AWS, or you don't use AWS.

donavanm · on March 22, 2012

I created a quick demo at http://region.strewth.org/ and a introspective TXT record at region.strewth.org. More details http://www.strewth.org/words/.

sargun · on March 22, 2012

This has been one feature that I've been waiting for a while for a while. Otherwise, multi-region load balancing had to be done in the application, via geo-IP, or a third-party provider.

The only thing I'm curious as to is what kind of measurements is Amazon gathering, and how is it gathering them? Is it using ELB, and looking at TCP latency (delta time between SYN/ACK? Curious minds want to know...

ez77 · on March 22, 2012

Sorry for bringing this off-topic issue on AWS: is it possible to set hard caps on monthly spending in AWS? I just asked this [1] but didn't receive a single upvote =S. Thanks.

[1] http://news.ycombinator.com/item?id=3737595

jeffbarr · on March 22, 2012

That feature is currently in beta for our Premium Support customers. Here's a third-party blog post with more information:

http://blog.bitnami.org/2011/12/monitor-your-estimated-aws-c...

donavanm · on March 22, 2012

No, AFAIk AWS does not have the ability to restrict usage based on expenditure. There are startups that do this though, check http://www.enstratus.com/.

marquis · on March 22, 2012

You could set your own cap by monitoring on your server, and shut down services as needed. Maybe a new startup idea here for someone? We monitor our bandwidth here, and while we haven't had to deactivate services it's something we're prepared for.

ez77 · on March 24, 2012

Update for those interested: it looks [1] like it'll never happen...

[1] https://forums.aws.amazon.com/thread.jspa?messageID=249000

ryanbales · on March 22, 2012

No. Just keep an eye on it.. it's cheap :p

esrauch · on March 22, 2012

Except if you go on vacation to unplug for a week and had some bug or misconfiguration couldn't you easily run up a bill in the hundreds or thousands when you expect it to be in the tens?

If this is so implausible then why can't they provide a cap? It feels to me that they are purposefully trying to capitalize on people's mistakes. Call me paranoid but it is honestly the primary reason why I've been wary to play with AWS.

ryanbales · on March 22, 2012

Most sites with the kind of traffic that could potentially run up the bill thousands of dollars will have checks in place to ensure a mistake like that wouldn't happen. Anything else would be irresponsible.. I'd challenge your assumption that Amazon is intentionally trying capitalize on mistakes... A company a big as Amazon doesn't have the time or need to even consider something like that.. And yes, I think you're being paranoid.

esrauch · on March 24, 2012

A spending control is such an obvious feature, and wouldn't exactly be difficult for them to implement. For some toy project it doesn't have to be thousands of dollars for this to be an annoyance to me, I have no desire to get a surprise bill of $200 at the end of a month even though I can easily afford it.

Doesn't it seem bizarre to give your credit card number over and agree to pay some undisclosed sum with no cap? If it's my money and I'm paying for some nonessential service, I expect to never be surprised at how much I'm charged, even if its $10 when I expected $2.

> I'd challenge your assumption that Amazon is intentionally trying capitalize on mistakes

I actually worked at Amazon (not on AWS) and I've worked at a few other BigCo's and you might be surprised by how many things exactly like this they do have the time and desire to worry about. You have it backwards; for a company as big as Amazon it's worth paying an entire salary just to worry about things like this, since a fraction of a percent of increased AWS income will more than pay for itself. It's extremely likely they are at least trying to capitalize on people who sign up for the free tier and accidentally go over their limit, or else they would have a "trial" mode that only confirms your credit card and won't ever charge it without your approval.

ceejayoz · on March 22, 2012

> Most sites with the kind of traffic that could potentially run up the bill thousands of dollars will have checks in place to ensure a mistake like that wouldn't happen.

One thing that'd help dramatically would be exposing the various account activity and cost data via API. There's currently no API for accessing one's monthly spend to-date.

From a comment elsewhere, looks like they're sorta working on it, finally (CloudWatch metric).

ez77 · on March 22, 2012

I don't think Amazon is being dishonest, but you can see that, regardless the individual degree of paranoia, there are quite a few small developers who don't try AWS just to be safe.

My point is that it makes business sense for Amazon to lure this old-fashioned crowd, and it could do so by implementing a simple feauture such as spending control.