Hacker News new | past | comments | ask | show | jobs | submit login

> including unfortunately the tooling we usually use to communicate across the company about outages.

There's some irony in that.




Edit: and I agree!

I’m not in SRE so I don’t bother with all the backup modes (direct IRC channel, phone lines, “pagers” with backup numbers). I don’t think the networking SRE folks are as impacted in their direct communication, but they are (obviously) not able to get the word out as easily.

Still, it seems reasonable to me to use tooling for most outages that relies on “the network is fine overall”, to optimize for the common case.

Note: the status dashboard now correctly highlights (Edit: with a banner at the top) that multiple things are impacted because Networking. The Networking outage is the root cause.


> the status dashboard now correctly highlights that multiple things are impacted because Networking.

this column of green checkmarks begs to differ: https://i.imgur.com/2TPD9e9.png


This is a person who's trying to help out while on vacation...can we try being more thankful, and not nitpick everything they say?


Thanks! I’ll leave this here as evidence that I should rightfully reduce my days off by 1 :).


The banner at the top. Sorry if that wasn’t clear.


While not exactly google cloud, G suite dashboard seems accurate: https://www.google.com/appsstatus#hl=en&v=status


For me, at least, that was showing as all green for at least 30 minutes.


AWS experienced a major outage a few years ago that couldn't be communicated to customers because it took out all the components central to update the status board. One of those obvious-in-hindsight situations.

Not long after that incident, they migrated it to something that couldn't be affected by any outage. I imagine Google will probably do the same thing after this :)


The status page is the kind of thing you expect to be hosted on a competitor network. It is not dogfooding but it is sensible.

Reminds me of when I was working with a telecoms company. It was a large multinational company and the second largest network in the country I was in at the time.

I was surprised when I noticed all the senior execs were carrying two phones, of which the second was a mobile number on the main competitor (ie the largest network). After a while, I realised that it made sense, as when the shit really hit the fan they could still be reached even when our network had a total outage.


> Not long after that incident, they migrated it to something that couldn't be affected by any outage.

Like the black box on an airplane, if it has 100% uptime why don’t they build the whole thing out of that? ;)


Was just reading it, they made their status page multi-region.


Even more irony: Google+ shown as working fine: https://i.imgur.com/52ACuiY.png


G+ is alive and well for G Suite subscribers, not the general users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: