I’m not in SRE so I don’t bother with all the backup modes (direct IRC channel, phone lines, “pagers” with backup numbers). I don’t think the networking SRE folks are as impacted in their direct communication, but they are (obviously) not able to get the word out as easily.
Still, it seems reasonable to me to use tooling for most outages that relies on “the network is fine overall”, to optimize for the common case.
Note: the status dashboard now correctly highlights (Edit: with a banner at the top) that multiple things are impacted because Networking. The Networking outage is the root cause.
AWS experienced a major outage a few years ago that couldn't be communicated to customers because it took out all the components central to update the status board. One of those obvious-in-hindsight situations.
Not long after that incident, they migrated it to something that couldn't be affected by any outage. I imagine Google will probably do the same thing after this :)
The status page is the kind of thing you expect to be hosted on a competitor network. It is not dogfooding but it is sensible.
Reminds me of when I was working with a telecoms company. It was a large multinational company and the second largest network in the country I was in at the time.
I was surprised when I noticed all the senior execs were carrying two phones, of which the second was a mobile number on the main competitor (ie the largest network). After a while, I realised that it made sense, as when the shit really hit the fan they could still be reached even when our network had a total outage.
There's some irony in that.