When I made that design decision I wasn't considering the possibility of DNS outages at all; I was just thinking in terms of "there's a huge number of places between me and ultradns where someone could insert a spoofed DNS response".
Ah, my bad, sorry I thought you meant in the tarsnap client. Then I really don't know why people would make a fuss over you hardcoding the ips in there. All you need to do is keep an eye out in case they change them (which you could even automate).
Are there DNS servers that support versioning? The best solution I could imagine would simply be to set Tarsnap to normally use the current DNS records for S3, but be able to rollback to a valid zone record if they encounter an update that makes the servers stop resolving.
The Tarsnap client only talks to the Tarsnap server, but I do cache that lookup in order to avoid problems with glitchy DNS resolution.
I could have the Tarsnap server cache DNS lookups if I was only worried about working around DNS outages -- but as I said, that wasn't something I was considering at all when I made the decision to eschew DNS.
Mostly security -- DNS is one of the worst offenders when it comes to protocols with security problems, both in terms of protocol issues and bugs in implementations.
Amazon doesn't move endpoints very often, and I round-robin requests to several endpoints; so in the rare cases an AWS endpoint changes I see a slight decrease in Tarsnap performance and can make the Tarsnap server stop using the dead endpoint before any users are likely to notice.
It is DNS. If you put the EC2/S3 address into /etc/hosts, the services work fine. Affecting lots of other big websites as well apparently (target, salesforce) because they all outsource to UltraDNS
NANOG chatter confirming it is an issue with UltraDNS. Seems to be west coast related.
EDIT: Potentially a DOS attack. From NANOG:
"We have some DNS providing type customers (not UltraDNS) receiving a few million packets/sec of UDP/53 DoS traffic, starting at about the same time as the UltraDNS problems. No clue if it's related, but it certainly sounds suspicious. :)"
The point of using UltraDNS is that it they provide fast "real-time" failover of DNS routing. This is used, for example, for fault tolerance where the IP address responding to a domain name might change due to failure scenarios or load balancing (where a different server is now primary responder to the domain name, maybe located in a different data center or country). Infrastructure-wise, UltraDNS is kind of like the Akami of DNS, instead of content distribution.
I thought DNS was supposed to try backups servers automatically... any DNS experts able to explain what's going on? Some of the ultradns servers are returning (correct) values, others simply not responding.
yeah, uh, if you are smart, you have a secondary DNS provider. But that really requires you managing it yourself. the problem was that many people outsource, which usually means going with only one provider. (now, ultradns does have a good setup, they probably aren't a bad choice for a provider, but having only one is just plain stupid.)
I don't know, I don't think it'd be as much as 4 - 7 days before Christmas. Most people know that it's too late to order from Amazon or any other online-only store by the night of Dec. 23. It's probably more money than they'd lose normally, but maybe not that significant?
I have no data to back that up, it's pure speculation.
It really depends on downtime, if it was more than say an hour a lot of people would probably buy elsewhere, small amount of downtime you wouldn't think would effect sales though.
Why wouldn't such a company run their own name servers? I understand it's "yet another thing to maintain," but I've set up bind before... didn't seem that bad.
This is one of those services that a dedicated provider can sometimes do better than internal IT. Ultradns, for example, has secure secondaries Colocated with large ISPs so you get some good protection against cache poisoning attacks. Everything they do you could do yourself, but it would cost you a lot more than what they charge. (full disclosure: I am a customer and until this evening I was quite happy with their service and reliability)
Yep, they are, to all the tenants in AWS datacenters.
My apartment however currently only can talk to Time Warner which talks to Level 3 and then, finally Amazon. I believe what the parent meant was that UltraDNS is colo'd with end-user ISPs like Time Warner, Verizon, etc.
Right, a lot of DNS attacks rely on the attacker replying to a request faster than the actual DNS server and there's no one closer to the end user than ISPs.
It's relatively trivial to outsource. It's one of those services that's easy to measure, quantify, and manage (from an outsourced perspective). There's also a bit more to it then that. The Anycast routing can be quite difficult to setup and maintain. It's virtually useless outside of a very small set of protocols (DNS being one of them), so it wouldn't make sense for Amazon to bring that kind of talent in-house for something like DNS.
the mistake was outsourcing to only one provider. it's easy enough to setup a BIND slave elsewhere that automatically transfers the zone from your primary provider.