Hacker News new | past | comments | ask | show | jobs | submit login
Why ALIAS-type DNS Records Break The Internet (iwantmyname.com)
38 points by alexbilbie on Jan 8, 2014 | hide | past | favorite | 41 comments



Amazon Route 53's ALIAS implementation takes a different approach. We only allow ALIASes to data that we know about authoritatively; so with Route 53 you can ALIAS to an S3 website Bucket (which also lets you do HTTP redirects), a CloudFront distribution (which can serve as a bridge to any arbitrary domain you may care), an ELB or to other records in your zone (which lets you combine routing policies in a compositional way).

The main benefits are;

   No dependency on a third party DNS service, we stand behind our 100% SLA
 
   DNS-based routing policies still work correctly

   No delay in responding to health check failures
but there's another (future, for us) ancillary benefit too;

   Compatibility with offline-signed DNSSEC
The big downside of course is that the record has to be in our system in order to ALIAS to it. For HTTP services, using CloudFront is a pretty good workaround, it can handle dynamic and static sites. If you merely want to redirect from your apex to your www. domain, then S3 with a redirect works great too.

We're also open to enabling ALIASing to other zones hosted in Route 53 on a case by case basis. If you have a multi-tenant service and you're managing (or willing to manage) a zone on Route 53 with something like [customer-or-resource-identifier].yourservicename.com we can enable ALIASing to those names. If you're interested, get in touch via the limit increase process; http://docs.aws.amazon.com/general/latest/gr/aws_service_lim... .

Obvious question: Why don't we enable ALIASing across zones by default? Firstly, zones which can be aliased to have to be replicated and available in all shards of our partitioned datastore (some day we may use some kind of cross-shard query protocol to resolve that, but even then we'd prefer to minimize the traffic) and secondly we want to ensure some stability for ALIAS targets and avoid situations where targets may be discontinued or deleted by their owners leaving our mutual customers stranded.


I host a high profile news site on AWS infrastructure.

> The big downside of course is that the record has to be in our system in order to ALIAS to it.

I need DDoS protection as the site gets attacked from time to time. I went with CloudFlare since ELB doesn't say much if anything about DDoS protection. This means I can't use Route53 and so I cannot work around apex issues and I can't point my apex to ELB either.

The suggested work around I keep coming across is to poll the ELB IP and update it at CloudFlare via an API though I am not sure how good that would work if someone cached the IPs for whatever reason (even though ELB will honor requests up to one hour after the IP change I read somewhere else).

Do you have any suggestions either for DDoS mitigation or finding a work around to my apex woes?


We've opened up a little lately, and some of our DDOS mitigation details are now available in my colleague Nate's talk from Re:Invent: www.youtube.com/watch?v=V7vTPlV8P3U .

What I can repeat from that talk is that we've handled DDOS attacks in the same orders of magnitude that are as large as anything else we've ever heard, that we use them ourselves for our own services.

We'd certainly be interested in a conversation about your needs, attacks you've experienced, and how we can meet them. (e-mail: colmmacc@amazon.com).

ELB is a multi-IP service and can dynamically scale from 1 to ~100 IPs. In general, when scaling down, IPs are retired from DNS but aren't removed from an ELB until we've seen traffic drain (which would explain the hour, but it's variable). If you want to know all of the IPs for an ELB, you can query;

   all.[elb-name].[region].elb.amazonaws.com
   all.ipv6.[elb-name].[region].elb.amazonaws.com
and you'll get them all. Though we strongly recommend against CNAMEing to those names directly, it can useful if you need the scrape the IPs systematically for some reason (and it sounds like this might be such a case). I'm not sure what other workarounds may be available via CloudFlare.


We specifically tried using CloudFront, solely for it's ability to serve as the target in an ALIAS record. However, CloudFront is useless for dynamic sites because it does not support passing the original Host header back to the origin. Any site that serves localized content (including localized redirects) is doomed by this inadequacy.


I'll pass this on to the CloudFront team. CloudFront distributions are free, so would it be possible to use a different distribution for each hostname? Or is this more of a wildcard set up?


I don't see how the caching thing is a problem specific to alias records. If you were using an A record instead, you would have to manually update the IP address, and then it would still be cached for however long.

The geoIP thing is pretty minor, ideally you should be able to tell the dns server the IP you are proxying for.

Who cares that there's no standard? Complete non-issue.

Even if I agreed that alias records were bad, whats the alternative? Manually updating A-records?


If I point foo.mydomain at bar.otherdomain with a CNAME, otherdomain can transition to a new IP without doing anything more than is required to make bar.otherdomain work.

If I point foo.mydomain at whatever bar.otherdomain's IP is (i.e. use an A record), otherdomain breaks my stuff whenever they change IP addresses, and there's not a lot they can do about that. Automating this (via the ALIAS mechanism mentioned in this article) just means that the breakage eventually goes away.

The correct solution, as pointed out, is to use CNAME - which requires that you use a subdomain.


The problem you cite is only a problem if the operators of otherdomain are either not aware of what you're doing, or otherwise don't want to support it, or are unable to handle a transition. If they are aware, it's generally a non-issue: They can ensure the site works on both addresses for a transition period.


Can you illustrate this with an example? How about explicit records and a 60 second ttl on A and CNAME to humor me.

If implementing recursion based ALIAS foo.mydomain RRs MUST respect the TTL of bar.otherdomains RRs. With that proviso Im missing what the effective cache lifetime difference is between the two.


I agree that this would work decently well if you could actually specify a 60-second TTL; but there are quite a few DNS resolvers that cache responses for at least a day, and there's not a whole lot you can do to change that. (This makes some sense - re-resolving all the time doesn't help performance!)


"Quite a few" is actually a very very small percentage these days.

If your ISP is disregarding TTL, you're just as bad off with CNAMEs pointing at a third party as you are with the ALIAS-type record.


Yeah, around 1% of resolvers seriously disregarding TTLs is what Ive seen. But if thats the argument theres absolutely no difference in cache lifetime of a single A backed by an ALIAS resolution and the typical A + CNAME. In either case all of the records should be cached for the broken resolvers lifetime. To counter that you do dns cache busting with prepended GUID labels or the like. Which totally breaks caching anyways.


> If implementing recursion based ALIAS foo.mydomain RRs MUST respect the TTL of bar.otherdomains RRs.

There is only two ways that I can think of. The nameserver could sets foo.mydomain TTL to be the reminder of bar.otherdomains current cached ttl, but it would mean that at some point it could go down as little as 1s long.

Or, the name server could make a new fresh resolving of bar.otherdomains, and forward that as the answer to foo.mydomain with bar.otherdomains TTL intact. However, that would make requests for foo.mydomain quite slow and severely increase the workload for the nameserver.

Given the problem of each of those solutions, its seems the common approach is to simply put a short TTL on foo.mydomain regardless of the TTL on bar.otherdomains, and then pray. Thus this article.


What's the difference between the ip changing via ALIAS and ip changing via the next hop after CNAME? In both cases you'll be stuck with the old address until the record times out in your cache, so either the provider has to make sure the old address works correctly for the whole TTL period after the record change, or you're going to hit the old/dead host.


Using a subdomain? What's wrong with "www."? For the apex all you need then is a tiny server that issues a permanent redirect. Why are people so obsessed with getting rid of "www."? The lack of dot causes issues with cookies, makes SSL a lot harder and is currently badly supported by DNS.


> What's wrong with "www."?

People (rightly) expect the url to work without it. I say rightly, because it can, so it should. The www hads no value to the user.


That's an extremely web-centric view of the world. There is more to DNS than its use in URLs. And doesn't redirecting to www. do the expected thing anyway?


To redirect to www.example.org I need a server listening at example.org. If I have www.example.org on heroku because I don't want to run a server, I suddenly need a server just for that redirect that I need to have because I don't want to run a server.


Or just use http://wwwizer.com/

EDIT: Or use DNSMadeEasy, they provide HTTP redirect options as part of their service (although I wish the default option would not be to wrap the target URL in an iframe...)


Or use an S3 bucket with a redirect.


But there's already a non-Heroku server running for that domain, i.e. the one serving DNS requests. That server can be re-purposed to do HTTP-level redirects, too, and indeed many name server operators offer that feature.


It does, but then can you do that without using an ALIAS record, which is one of the reason invoked to use "www"?

Also, web-centric is fine, browsing the net is 99% of my URL manipulation, and I'm a developer that has to work with web APIs.


For a high-availability site, you need a cluster of redirect servers behind the same IP. For the reasons you name, it's generally still worth redirecting web traffic to www.*, but it's trivial only for a non-HA site.


Im missing quite a lot in this article. Personally i have my reservations about ALIAS. Recursion backed implementations in particular are full of Dragons and sharp corner cases. It's a shame that its missing any substantial critisicm or examples of poor implementations of ALIAS records.

  An authoritative nameserver ... can deliver records in a predictable speed
Thats a nice to have. And totally unrelated to the AA bit. Query latency is usually implemented via caching then querying multiple/fastest available authoritative NS.

  An ALIAS record resolves on request the IP address of the
destination record and serves it as if it would be the IP address for the apex domain requested First, backing with dns resolution is just one implementation of the idea. Secondly it is the ip address of the domain requested. An authoritative ns setting AA makes it so. There's nothing that specifies what the backing data store or resolution method is for AA answers. Implementation detail.

   you will send traffic for your mapped apex domain to the wrong address until the record expires in all caching
resolvers. Now weve discovered, but not actually mentioned, TTLs. How is this any different than the ttl & expiry on a traditional CNAME + A record chain thats proposed at the end?

  you request the IP address from the nameserver of your DNS provider, not from your actual location.
Assuming implementation is backed by dns recursion, sure. Good thing theres a standard like EDNS client-subnet that provides a method to propagate and vary based on the network of the original requester. But point taken, DNS is a complicated protocol and you should probably understand it before developin new features and implementations.


> How is this any different than the ttl & expiry on a traditional CNAME + A record chain thats proposed at the end?

The TTL on a CNAME is about caching the canonical name, not the IP address.

Example with CNAME:

Client ask the authoritive nameserver for example.com about bar.example.com. and get back a CNAME to foo.example.com. with a TTL of 2 days. The client store this in his resolver cache.

Client then goes to ask about foo.example.com. and get back a A record for 203.0.113.3, with a TTL of 2 hours.

Client cache looks like this: "bar.example.com. CNAME foo.example.com." that expires in 2 days, and "foo.example.com. A 203.0.113.3" that expires in 2 hours.

If the domain foo.example.com. changes its A record, it will be reflected in 2 hours for clients who uses bar.example.com.

Example with ALIAS:

Client ask the authoritive nameserver for example.com about bar.example.com. and get back A record to 203.0.113.3 with a TTL of 2 days.

Client resolve cache will be a single entry of: "bar.example.com. A 203.0.113.3" that expires in 2 days.

If the foo.example.com. changes IP address, its TTL is not considered by clients who uses bar.example.com., because its not in their cache.


I see your point; ALIAS records are not a CNAME substitute and should not be treated as one. However, they do have their uses and the risk is minimal if you understand what ALIAS records are doing and how to use them.

I'm having a bit of a hard time following your example but I believe if the authoritative DNS provider simply inherited the TTL of the alias target then the behavior would be as desired. This gets back to donavanm's point that this is more an issue with the way ALIAS records are implemented then a problem with the concept of ALIAS records.

One other issue that the author doesn't touch on is that we now have 3 implementations of ALIAS records that I'm aware of (Route 53, github, and Heroku) and there are differences about how they behave. However we have those 3 providers using the same ALIAS to describe similar but significantly different things. This is clearly confusing and potentially disastrous for users.


In general, implementing ALIAS records through recursive resolution is a bad idea.

Geo routing and other resolver IP-based policies will probably break, since the authoritative name server has no way of knowing the resolver IP. Edns-client-subnet does not suffice in this case, since the authoritative name server may still base part of its decision on the resolver IP.

A much bigger problem is that the forwarding name server can be used for DDoS amplification attacks and that its own resources can easily be exhausted by an attacker if the authoritative name server is slow to respond. If the forwarding name server opens a new socket for every query it makes, the set of available port numbers can easily be exhausted. If the forwarding name server reuses port numbers, then spoofing attacks become straight-forward.

This does not mean ALIAS records are a bad idea in general. The Amazon Route 53 implementation resolves aliases internally. It is therefore limited it to AWS services, but you could, for example, point an ALIAS to CloudFront and point CloudFront to your website. CloudFlare offers a similar service.


Only I found it funny that the article is on a site without a subdomain?


They are a DNS provider so they have to use A records anyway.


Can someone explain the rational in the standard disallowing CNAMEs on apex domains?

Before I learned about this restriction I actually had my site configured like that (my DNS provider had poor validation...) and it seemed like it worked for at least 2/3 users (who apparently had lenient DNS servers)


CNAMEs are name-level aliases. So if for example I make the following query;

  name=www.example.com type=A
and get a response of the form;

  www.example.com 3600 IN CNAME www.example.org 
a resolver will then recurse and lookup name=www.example.org, type=A. The CNAME indirection is cacheable for all queries to www.example.com though, of any type. So if I make a query for;

   name=www.example.com type=AAAA
the resolver can skip any query to the nameservers for www.example.com and go straight to a query for name=www.example.org, type=AAAA. So in effect a CNAME "masks" all records of any type, for a given name.

At zone-apexes this becomes a problem. Zone apexes are required to have SOA and NS record sets which play important roles in record-not-present responses and nameserver discovery respectively. Another problem is masking MX records, but you could always copy the MX records at the target name to work around that.


First had to read the initial link in OP. From there, understand that this so-called ALIAS record is specific to one (or more?) DNS provider called dnsimple.com.

Broken? Maybe. I've never used one. The problem with the apex record not being able to be a CNAME is easily avoidable. I'd prefer to stay RFC compliant rather than use some one-off.


What's the consensus here? http://zx2c4.com or http://www.zx2c4.com ? For a while I did the latter, then I switched to the former, and now I can't make up my mind and am tempted to go back to the latter.

Opinions? Thoughts?


I prefer the later because that way you can have cookies set only for the site subdomain and serve static elements from a cookieless subdomain, instead of a different domain altogether. Just add a redirect from the naked domain and you're golden.


Compelling enough.


Why don't admins just set a smaller TTL for anything that is aliased?


This seems kind of funny given that Github's most recent advice was to use ALIAS DNS records for apex domain types.


Can anyone explain the difference between an ALIAS and a DNAME record?


> DNAME record?

You mean a CNAME?

belorn gave a pretty good explanation here: https://news.ycombinator.com/item?id=7023412


A CNAME and an ALIAS both reference a specific label/name. A DNAME applies to a tree, and subcomponents. Wikipedia has a decent example of DNAME usage.


alias returns an IP, CNAME returns a name.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: