Why would Microsoft be disproportionately affected by this? Are we expecting similar decrees from AWS and GCP? Or was Microsoft operating with less runway before this began?
Working with all three of them, so here's my two cents:
1. Mostly traditional, "legacy" companies have been hit hard by this. Ones that don't have culture or technology of work-from-home. Those companies use some Microsoft products. Also, Microsoft has been poaching them heavily, handing out trials, bundling licenses and so on. A lot of them don't actually buy stuff from Microsoft, but through 3rd party vendors which have incentives of their own. Some of the end users don't actually want to use AWS, also.
2. I actually think Microsoft has much less runway. From what I understand, AWS has more modern infrastructure and backend, and they shuffle resources easily around, between services, and I think they have much more in reserve. Microsoft has concentrated much more on the sheer number of regions.
3. Azure has a strange way of handling quotas, if you ask me. Up until now, once you provision a VM, it is deducted from a quota and stays like that as long as it exists. It has never been an issue to actually power it on (unlike AWS), once you have it. It's not billed, but we always thought it stays like that. Since last week, you can see failures not only when provisioning VM's (even within your quota) but also when starting them. Nevertheless, I also think a lot of users had larger quotas allocated then they actually use. So they just started creating more VMs or other resources (because they could), and the thing came crashing. I think that's just poor planning on Microsoft's side.
But the thing I'm mostly pissed of is the status page. VM's are failing left, right and center and everything is nice and green on the status page. Once you open a ticket, they send you an incident-in-progress report.
> But the thing I'm mostly pissed of is the status page. VM's are failing left, right and center and everything is nice and green on the status page. Once you open a ticket, they send you an incident-in-progress report.
Status pages parroting lies in service of marketing should incur more liability for companies than they do. How does a society discourage vendors from doing things like this?
It should eventually fall under consumer protection laws. Failing to report incidents on a status page is like failing to mention trans fat on a package label.
>Your tortillas with partially hydrogenated vegetable oil? Yeah, it's less than 0.5g per serving so they get to say 0g.
Personal favorite: Tic Tacs are 94.5% sugar yet Tic Tacs can be marketed as a "sugar free" food.
"The Nutrition Facts for Tic Tac® mints state that there are 0 grams of sugar per serving. Does this mean that they are sugar free?
Tic Tac® mints do contain sugar as listed in the ingredient statement. However, since the amount of sugar per serving (1 mint) is less than 0.5 grams, FDA labeling requirements permit the Nutrition Facts to state that there are 0 grams of sugar per serving."
At the moment it's reputational enforcement: if enough people badmouth them on Hacker News, the money will flow to competitors. (There are other ways to enforce norms, but they have their own problems.)
Mostly traditional, "legacy" companies have been hit hard by this
Yes, this. Where I work, WFH had an outright ban until this all hit the fan. Now, anyone who can do so has a mandate to WFH. We use Citrix to access some internal resources, though it's really inefficient: If you want to access a file on network drive, say an excel file, you need to use Citrix to launch an Excel instance, open the file, and deal with it there. You can't easily get it to your local computer, so this adds to the overhead necessary to use the system, which has been strained of late (and it's not like citrix lacks the ability to transfer files between remote & local, they've just disable mapping of local drives during a session) Luckily I have a server-class workstation under my desk at work for heavy data-crunching, and setup my own VPN, which works much better. At least unless/until they turn power off in building not being used.
I think Azure also has more regions than AWS, but yet a smaller overall capacity probably, so it's gotta be harder to keep space capacity. It's notable the shortages are isolated to specific regions. They were possibly small to start and so it probably didn't take as much to hit capacity there? Just a guess.
That's my hunch also... Speaking for someone interested in IaaS/VDI, not all instances are available in all regions. For example, instances with GPUs are available only in North and West Europe. If you need them, and they are out of stock, you have no value of any UK or France region.
Mostly on deployment (after the process has actually started), but sometimes after (on start).
Since we use Azure a lot through the API, there is a difference in the first case. Usually it would fail on the first API call, it would just tell you this or that went wrong. Now the process fails after it has been started.
Office 365 runs on Azure in the same regions other workloads do (There's probably some replication going on behind the scenes too, since you don't pick your instance), so I'd bet that tons of new Office 365 subscriptions in the past month are what is causing this, not people deciding to lift-and-shift their app workloads because of the virus. Teams adoption has probably shot way up too.
The GCP outages don't seem to have been capacity related though, they were both networking issues. In one of them they broke their traffic routing to a datacenter in Atlanta, and in the other one an unnamed cloud provider (that pretty much has to be AWS) broke inbound inter-cloud traffic in us-east1 (so presumably AWS borked the link from Google's us-east1 to AWS us-east-1). These seem like normal sorts of cloud issues that happen to coincide with covid-19.
If anybody at AWS is reading this: congratulations! Thank you! Keeping infrastructure running is hard in the best of times, and these times definitely are not the best.
I imagine companies are flocking to Microsoft's virtual desktop solutions more than AWS or Google. It is the most well-known provider among managers who need to make a snap decision right now on how to have people work from home, if I had to guess.
Yeah as a member of Azure networking, it's odd: HR is communicating we should do video calls for meetings instead of just audio, to keep the personal touch, and I'm wondering if that's really the best use of our bandwidth. I keep my video off anyway.
A huge portion of this has got to be MS's own services, e.g. tons of people who normally use Office and are at work are now use Office 365 online and doing a ton more videoconferencing with Teams.
All Office 365 company customers were also dormant Teams customers. There's a lot of these companies and some non-trivial portion of those have now become active Teams users. That probably drives a big portion of the increased conference traffic at least.
A lot of companies and educational institutions already had Teams licenses through their Office 365 contracts, but weren't really using Teams. Now they're using it, a lot, because "it's already paid for". Simple as that.
Because a lot of people have their Office 365 subscription and maybe just use email. All of a sudden they realized they are paying for all this other stuff they aren't using so they're leveraging it now that they need it.
If you're shopping for cloud providers, nobody gets fired for picking AWS. Microsoft is if you're looking for email, spreadsheets, and desktop publishing.