Hacker News new | past | comments | ask | show | jobs | submit login
Amazon goes down; takes S3, Salesforce, Target, and others with them. (techcrunch.com)
78 points by Sam_Odio on Dec 24, 2009 | hide | past | favorite | 46 comments



And people told me I was crazy for hard-coding IP addresses for s3.amazonaws.com and sdb.amazonaws.com into the Tarsnap code... :-)


You are crazy for doing that - one instance of downtime doesn't justify ignoring all the advantages DNS brings.


When I made that design decision I wasn't considering the possibility of DNS outages at all; I was just thinking in terms of "there's a huge number of places between me and ultradns where someone could insert a spoofed DNS response".


What if amazon gets a new netblock assigned ?

Can you forcibly upgrade the software in that case ?


Of course I can upgrade the software; and there's nothing forcible about it. We're talking about code I'm running on my server here...


Ah, my bad, sorry I thought you meant in the tarsnap client. Then I really don't know why people would make a fuss over you hardcoding the ips in there. All you need to do is keep an eye out in case they change them (which you could even automate).


All you need to do is keep an eye out in case they change them

Exactly, and that's what I do. (With the caveat that I look for AWS endpoints being taken out of service, not for changes in what DNS tells me.)


Are there DNS servers that support versioning? The best solution I could imagine would simply be to set Tarsnap to normally use the current DNS records for S3, but be able to rollback to a valid zone record if they encounter an update that makes the servers stop resolving.


The Tarsnap client only talks to the Tarsnap server, but I do cache that lookup in order to avoid problems with glitchy DNS resolution.

I could have the Tarsnap server cache DNS lookups if I was only worried about working around DNS outages -- but as I said, that wasn't something I was considering at all when I made the decision to eschew DNS.


Some DNS caches will continue to try giving out expired records if no newer ones can be found. Unclear that it's worth it, though.


you can expect such downtimes every Christmas. why? ask yourself how much protection money UltraDNS was told to pay.


Awesome ... was that for security or for redundancy? Did you have a solution for what would happen if the IP address changed?


Mostly security -- DNS is one of the worst offenders when it comes to protocols with security problems, both in terms of protocol issues and bugs in implementations.

Amazon doesn't move endpoints very often, and I round-robin requests to several endpoints; so in the rare cases an AWS endpoint changes I see a slight decrease in Tarsnap performance and can make the Tarsnap server stop using the dead endpoint before any users are likely to notice.


Hmmm... should I trust my backups to generic software vendor X, or to someone who clearly takes nothing for granted... tough choice. Signing up now.


It is DNS. If you put the EC2/S3 address into /etc/hosts, the services work fine. Affecting lots of other big websites as well apparently (target, salesforce) because they all outsource to UltraDNS


NANOG chatter confirming it is an issue with UltraDNS. Seems to be west coast related.

EDIT: Potentially a DOS attack. From NANOG: "We have some DNS providing type customers (not UltraDNS) receiving a few million packets/sec of UDP/53 DoS traffic, starting at about the same time as the UltraDNS problems. No clue if it's related, but it certainly sounds suspicious. :)"


I'm very surprised that they rely on a single vendor. But I guess DNS is one of those things you don't think about until it fails.


The point of using UltraDNS is that it they provide fast "real-time" failover of DNS routing. This is used, for example, for fault tolerance where the IP address responding to a domain name might change due to failure scenarios or load balancing (where a different server is now primary responder to the domain name, maybe located in a different data center or country). Infrastructure-wise, UltraDNS is kind of like the Akami of DNS, instead of content distribution.


I thought DNS was supposed to try backups servers automatically... any DNS experts able to explain what's going on? Some of the ultradns servers are returning (correct) values, others simply not responding.


yeah, uh, if you are smart, you have a secondary DNS provider. But that really requires you managing it yourself. the problem was that many people outsource, which usually means going with only one provider. (now, ultradns does have a good setup, they probably aren't a bad choice for a provider, but having only one is just plain stupid.)


Never let truth get in the way of a good headline.


Haha.. true. What else can we expect from Techcrunch?


It doesn't say in the TC article and I can't really tell from this thread. How long was Amazon actually down?


I wonder how much money amazon loses per minute 2 days before christmas. Ouch.


I don't think that Amazon itself will lose much, since they're well-known enough that most people who fail to reach them will retry later.

I think the small fry using S3 and EC2 will be the ones who are actually hit by this.


I don't know, I don't think it'd be as much as 4 - 7 days before Christmas. Most people know that it's too late to order from Amazon or any other online-only store by the night of Dec. 23. It's probably more money than they'd lose normally, but maybe not that significant?

I have no data to back that up, it's pure speculation.


At $20B annual, assuming Christmas is a big chunk, I'm guessing $2B (10%) in the last two weeks, 2 minutes would then be worth $200,000?

I think my 10% number is too low, so maybe $500K for two minutes? But this is not profit, just revenue.


Do you think it might catch up once it is available again? It is a pretty serious shopping destination, and possibly people retry.


It really depends on downtime, if it was more than say an hour a lot of people would probably buy elsewhere, small amount of downtime you wouldn't think would effect sales though.


surely they mean "Amazon goes down and takes the Internet with it"


Only if TC was hosted there.


It looks like a DNS issue, since while http://amazon.com isn't working for me http://72.21.207.65 is (kind of).

EDIT: "No A records were found for amazon.com" http://www.zoneedit.com/lookup.html?host=amazon.com&type...


When Rackspace goes down and takes TechCrunch with them, the whole INTERNET goes down. When Amazon goes down, Salesforce, Target, and others...


It looks to be back up now.


Why wouldn't such a company run their own name servers? I understand it's "yet another thing to maintain," but I've set up bind before... didn't seem that bad.


This is one of those services that a dedicated provider can sometimes do better than internal IT. Ultradns, for example, has secure secondaries Colocated with large ISPs so you get some good protection against cache poisoning attacks. Everything they do you could do yourself, but it would cost you a lot more than what they charge. (full disclosure: I am a customer and until this evening I was quite happy with their service and reliability)


"Colocated with large ISPs so you get some good protection against cache poisoning attacks"

Ah yes, that's not something I factored into my equation. Thanks!


Amazon IS a large ISP though...


Yep, they are, to all the tenants in AWS datacenters.

My apartment however currently only can talk to Time Warner which talks to Level 3 and then, finally Amazon. I believe what the parent meant was that UltraDNS is colo'd with end-user ISPs like Time Warner, Verizon, etc.


Right, a lot of DNS attacks rely on the attacker replying to a request faster than the actual DNS server and there's no one closer to the end user than ISPs.


It's relatively trivial to outsource. It's one of those services that's easy to measure, quantify, and manage (from an outsourced perspective). There's also a bit more to it then that. The Anycast routing can be quite difficult to setup and maintain. It's virtually useless outside of a very small set of protocols (DNS being one of them), so it wouldn't make sense for Amazon to bring that kind of talent in-house for something like DNS.


The Anycast routing can be quite difficult to setup and maintain.

Anycast's not particularly difficult to setup and maintain.


the mistake was outsourcing to only one provider. it's easy enough to setup a BIND slave elsewhere that automatically transfers the zone from your primary provider.


I'm not sure that "setup a BIND slave" is quite that easy for someone with as much traffic as Amazon.com.


I'm sure they could hire the muscle to do it.

I'm just saying, relying on just one company is usually a bad idea.


Internet can be so fragile.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: