Hacker News new | past | comments | ask | show | jobs | submit login

AWS experienced a major outage a few years ago that couldn't be communicated to customers because it took out all the components central to update the status board. One of those obvious-in-hindsight situations.

Not long after that incident, they migrated it to something that couldn't be affected by any outage. I imagine Google will probably do the same thing after this :)




The status page is the kind of thing you expect to be hosted on a competitor network. It is not dogfooding but it is sensible.

Reminds me of when I was working with a telecoms company. It was a large multinational company and the second largest network in the country I was in at the time.

I was surprised when I noticed all the senior execs were carrying two phones, of which the second was a mobile number on the main competitor (ie the largest network). After a while, I realised that it made sense, as when the shit really hit the fan they could still be reached even when our network had a total outage.


> Not long after that incident, they migrated it to something that couldn't be affected by any outage.

Like the black box on an airplane, if it has 100% uptime why don’t they build the whole thing out of that? ;)


Was just reading it, they made their status page multi-region.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: