That because hosting providers have their systems set up to bypass contained failures. Single failures are transparently compensated for, so you never notice them. The only system that could take down the entire network is (mostly) the load-balancing/failure-compensating system. It makes sense that the bugs in these system are almost exclusivly cascade failures.
I know neteng isn't the simplest thing in the world, but I was struck that for all the "Google Interview...dummies don't even think about it" stories (not to mention Microsoft-mockery the last couple decades), the fix was first to reboot everything, then when that lumped too much traffic in the wrong places, to reboot everything more slowly, 4hrs later, which fixed everything in 35min.
Think "load balancing service" when you see "traffic router" in this case. This was not necessarily a case of physically rebooting a Juniper or something.
That's one interpretation, but from what little information they offer it seems their initial "reboots" (of whatever form) introduced asymmetric traffic loads that blew out some segments, after which they chased fixes for a couple hours.