> Since Microsoft’s other businesses are either hardware or office software, with the exception of maybe XBox’s gaming, it generally lacks operational experience in running “always on” services and preparing for unanticipated traffic surges.
Uh, as a former Microsoft equivalent of a SRE this is a utterly ridiculous statement.
I was in the Windows division on a team handling all the incoming telemetry - we definitely knew how to prepare for traffic surges and keep our services "always on".
Has the author never seen Office 365? Bing? The entirety of Azure in general that has dozens if not hundreds of publicly facing always on services?
>> we definitely knew how to prepare for traffic surges and keep our services "always on"
Okay, but as a heavy customer of AWS for 10 years and Azure & GCP for 6 years... there's some very valid criticism of the capacity planning and elasticity.
I hit capacity, provisioning and scaling issues all the time in Azure regions. And many times the cause is not obvious because most services seem to be unaware that the resources are exhausted, and fail with seemingly unrelated errors and partial failure.
edit: and to be fair, I'm not saying it's easy to build something like Azure or that I would do better. And I have a huge list of cases where Amazon's product engineering is terrible compared to Azure's. But... the problems I've seen all week in the UK aren't imaginary. I've had no such problems on Amazon.
> there's some very valid criticism of the capacity planning and elasticity.
Indeed. Getting quota increases is also a much slower and involved process, compared to AWS or GCP.
> and fail with seemingly unrelated errors and partial failure.
Oh, don't get me started! If managers were forced to work with the Azure API, they would not choose it as the provider. The unclear error responses are the most aggravating part.
I don't disagree with anything you are saying here, but I was heavily taken back by the statement the author made as it felt right out of the early 2000's, ignoring the last 20 years of Microsoft's shift to services and the cloud.
Azure clearly prioritized massively increasing their geographic footprint without the availability zones that AWS almost always deploys when they enter a new region, and I have no doubt they run leaner with less compute available then AWS.
When I was on the telemetry team about 3 years ago, we were in the early days of transitioning from the internal only "physical cloud" that the Bing team had created years before into Azure, and it was full of pain for a service our size at that time.
You are judging by what actions you were taking to make things better. Everyone else judges by what results they see.
Based on my experience, Azure is not as good at AWS and Google at actually keeping things up and running when the shit hits the fan. And AWS is not as good as Google. (That said, AWS exposes a lot more to customers than Google does. Google may stay up, but I prefer running software in the AWS cloud.)
That isn't to say that all three aren't doing a lot right. Clearly they all are. And clearly they are all doing a lot right that I don't know how to do. But the fact that they do a lot right doesn't mean that they are equal.
I used all 3 aws providers and AWS is the most reliable, but hard to configure. Azure is much easier to use and require the least expertise. I don't see any reason to use google cloud, because it has couple of outages last year and even their UI is quite buggy. I noticed that every time someone post something negative about google here, their employees cowardly try to downvote such posts... So I will expect the same here.
Office 365 was derisively referred to as “Office 360” with some good reason years ago. (I use it and the uptime has gotten noticeably better. That it could get noticeably better is the problem.)
I think they have the experience, but the success has not always been great. I used to have a hotmail account -> not impressed with how microsoft managed that vs things like gmail (downtime etc).
Same thing during the early Office 365 days. These systems were rough.
Zoom and AWS seem to be handling scaling pretty well - I'm a bit confused though as to how much extra capacity it would make sense for AWS to be carrying or maybe they aren't being asked to scale the way microsoft is which is my guess.
> I'm a bit confused though as to how much extra capacity it would make sense for AWS to be carrying or maybe they aren't being asked to scale the way microsoft is which is my guess.
AWS has a crapton of capacity. So much so that they were the first ones to come up with the 'spot' instance concept. We could easily get hundreds of spot instances most days, even in
a popular region like US-West 2. Those were relatively big ones too, with a minimum 16 VCPUs and 64GB – but given the spot pricing, we would request even bigger ones if they were available. Even when they were terminated, usually we would get a replacement instance in a few minutes. Even faster if the fleet request was flexible and didn't need a particular instance type.
AWS will also honor pretty large quota increases without batting an eye, while Azure (and sometimes even GCP) would send similar numbers "for review" and take their sweet time.
I can only remember a couple of times, years ago, where requests for normal, on-demand instances failed to get satisfied, in a particular availability zone. But the other zones would be fine.
Most Azure regions do not even _have_ availability zones, so that's out.
AWS is bigger but I doubt they have seen the same type of change in traffic as I don't think they have the same traffic patterns that would be affected by something like this.
Teams was used before but suddenly in my company you had 100k office workers working from home and all meetings converted to Teams meetings, a lot with video streaming. People also seem to react by booking even more meetings now when the casual talk was gone (instead of chatting casually).
Isn't Zoom on AWS? So are Slack and Chime (the little known AWS competitor to Teams). As are parts of Netflix, Disney+, Hulu, and of course Prime Video.
I don't know the exact numbers but I'm pretty sure the rise in popularity of Zoom is just as high, if not possibly higher, than Teams. Entire universities of 50k+ students started using Zoom for online classes, not to mention all of the companies that have started using it. Then you have all of the people staying at home using the streaming services, etc... my guess is that AWS has seen a huge spike in traffic just as big, if not bigger than Azure saw.
>The software giant counted 32 million daily active users of Teams last week on March 11th, but this jumped by 12 million to 44 million daily active users yesterday.
Microsoft seems to have had an increase that similar to Zoom's whole user base in a week (and I don't know how much was added before that).
I think people in tech circles underestimate how many large non-tech companies just followed Microsoft into the cloud and started to use Office 365. And I assume a lot don't use it that much, maybe for a meeting here and there and a few (or way too many) channels but suddenly everyone in the company is on it.
“To put this growth in context, as of the end of December last year, the maximum number of daily meeting participants, both free and paid, conducted on Zoom was approximately 10 million. In March this year, we reached more than 200 million daily meeting participants, both free and paid.”
This seems wrong to me. Isn’t almost a third of the worlds traffic going through AWS? Think about all the video conferencing calls, streaming videos, and gaming happening now. ISP’s were almost crumbling in Europe last week. AWS is almost certainly experiencing more traffic the azure
> AWS will also honor pretty large quota increases without batting an eye, while Azure (and sometimes even GCP) would send similar numbers "for review" and take their sweet time.
I have so many bad experiences with limit increases on AWS.
I just went through one trying to get concurrent AWS Lambda executions increased from the default 1.000 in a region to 10.000. I had to open a support case, fill out an optional questionnaire (which turned out to be not optional at all), the support engineer had to forward the questionnaire to the AWS Lambda service team and they came back a few days later with: oh, you can have 5.000 concurrent executions.
Even worse a few years ago (not sure if that's still the case) we tried to get an increase of the sum of storage available as EBS-volumes. That took quite a bit back and forth with AWS and in the end we learned that they do reserve such EBS-storage for a customer no matter if the customer uses it or not. And mind: We didn't request something like guaranteed available EBS-storage, but instead just the ability to request more storage.
EC 2 and ELB increases are relatively easy to come by, but for some services like Media Services things has to get reviewed by the service team which can take a while.
Yeah, really weird statement for the author to make, particularly given that at the top of the very same section they list a bunch of cloud-based MS products.
Agreed, Microsoft has been running cloud services for long time, even before the cloud got its name. Things like Windows Update, Xbox Live, MSN, Hotmail... some of them have been around for 20+ years.
> One misleading headline that’s been pushed out by Microsoft and unfortunately picked up by various tech media outlets, is the “775% increase” in Azure cloud service usage in geographical areas that are most committed to some form of social distancing or shelter-in-place policy.
It's way worse than that. The original statement:
> We have seen a 775 percent increase of our cloud services in regions that have enforced social distancing or shelter in place orders
The updated statement:
> We have seen a 775 percent increase in Teams' calling and meeting monthly users in a one month period in Italy, where social distancing or shelter in place orders have been enforced.
Somehow an increase in usage of a single feature of a single product in a single country was miscommunicated as an increase in demand for all of Azure's cloud services. Microsoft's PR team royally screwed up on this one.
This "single feature" is video-conference and is used by almost every Office 365 business customer. In chat propositions, it's Slacks biggest competitor.
There is no feature bigger in bandwidth than this one.
I believe the numbers are similar in Europe.
By reference, I think it's similar as having a Netflix business appear from nowhere.
But is this a feature many actually used before? If it was used e.e. 10 hours prior and now 7.75 hours, it's insignificant.
A percent increase is a meaningless metric without the baseline. If it's "all azure" then we immediately know it's a huge increase! But if it's just a possibly seldom used feature it's minimal.
> limited to places where shelter-in-place is in full force, which at least in the United States, is only a small part of the country with densely populated urban areas: New York, California, New Jersey, Michigan, etc.
Ummm, it is NOT a "small part" of the country that is under "shelter-in-place". By geographic area, maybe, but California, New York and New Jersey alone include some of the largest cities by population and population density. And there are now several other states with large populations (Illinois, Florida, Washington, for example) not listed here that have shelter-in-place as well.
UPDATE: I found an interactive map that shows which states have statewide shelter-in-place orders in the United States; it looks like all the states with the largest populations (and more than 50% of the country's total population) are covered, which is actually significantly larger than I had actually thought before I looked this up: https://www.nytimes.com/interactive/2020/us/coronavirus-stay...
The senate is there so the larger states don't trample over, say, Idaho. Majority rules does NOT work for the USA. You have to think of us as the EU with member votes for each country (country being state). This nation is not supposed to be about the federal government, it all went to shit around 9/11 and has continued to do so.
Even the House is slowly becoming less representative. The number of seats was capped at 435 back in the Taft era and states have rarely lost seats since[1].
With the 2020 Census likely in disarray due to both the Coronavirus and this administration's clear, documented intent to bias it "to Republicans and non-Hispanic whites,"[2] the crisis of representation will only worsen.
- Fugitive Slave Act (Federal government uses military force to kidnap people who were, according to the laws of the state they were in, free citizens and force them back into captivity)
It doesn't necessarily mean that the Senate arrangement is a good thing - it just happened to be the compromise they had to settle on to move forward with the Constitution back in the day. EC is a similar arrangement.
But one has to remember that this was back when there were 13 states, and the states themselves were very different from what they are today. So was the federal government, of course (and the change in that began long before 9/11). What may have been a reasonable compromise back then is not necessarily so reasonable today.
Interestingly, the Federalist papers, when arguing for the new Constitution, criticized the Articles of Confederation arrangement of one-state-one-vote - but that criticism is formulated broadly enough that in today's context, it's just as applicable to the Senate:
"Every idea of proportion and every rule of fair representation conspire to condemn a principle, which gives to Rhode Island an equal weight in the scale of power with Massachusetts, or Connecticut, or New York; and to Deleware an equal voice in the national deliberations with Pennsylvania, or Virginia, or North Carolina. Its operation contradicts the fundamental maxim of republican government, which requires that the sense of the majority should prevail. Sophistry may reply, that sovereigns are equal, and that a majority of the votes of the States will be a majority of confederated America. But this kind of logical legerdemain will never counteract the plain suggestions of justice and common-sense. It may happen that this majority of States is a small minority of the people of America; and two thirds of the people of America could not long be persuaded, upon the credit of artificial distinctions and syllogistic subtleties, to submit their interests to the management and disposal of one third. The larger States would after a while revolt from the idea of receiving the law from the smaller. To acquiesce in such a privation of their due importance in the political scale, would be not merely to be insensible to the love of power, but even to sacrifice the desire of equality. It is neither rational to expect the first, nor just to require the last. The smaller States, considering how peculiarly their safety and welfare depend on union, ought readily to renounce a pretension which, if not relinquished, would prove fatal to its duration."
Note the highlighted bit. At the time it was written, a simple majority of the states could be had with less than one-third of the total population represented by those votes. Today, a 75% supermajority of the states (as necessary to e.g. ratify constitutional amendments) can be had with less than 1/4 of the total population of the country. It's a very different balance, and given how much more important the Senate specifically is due to Supreme Court appointment, I'd say that the warning Hamilton gave to smaller states in the Confederation is once again fully in force today.
More specifically: every state has 2 US Senators, whereas Representatives in the US House of Representatives are allocated to states based upon population.
From the link, Google has seen the following for Meet usage:
“Over the last few weeks, Meet’s day-over-day growth surpassed 60%, and as a result, its daily usage is more than 25 times what it was in January. Despite this growth, the demand has been well within the bounds of our network’s ability.”
Are G-suite even in the same league as O365? I have no numbers and might be incredibly biased by location but I feel that all the "boring" office companies in Europe has moved towards Office 365 as a natural step.
That is of course not true and I'm sure there are plenty of G-suite companies but for the large ones it seems very common that they have just followed Microsoft into the cloud.
Doesn't seem like a particularly balanced article. Plus some downright bizarre statements. GCP is "not big enough to be stress tested on the same level."??? Even smaller operations are subject to stress proportionately and I'm using a hell of a lot of google hangouts conferences...
No, this is why GCP is fairly crap when comparing to AWS. Google doesn't actually use GCP for building anything so they have less incentive to actually make it better, that or it means they don't understand the issues with it.
At least with AWS you know that Amazon dog-foods the hell out of it and have a major incentive to improve it.
Is that really the case though? I've never seen any discussion on how GCP is architected and distributed on the google network. Does a GCP server run next to one serving up google.com results?
This is really interesting, I always assumed it was exactly the same infrastructure but segregated off into customer instances. If anybody has more info here that would be really neat.
Where exactly are the "cracks" of which the author speaks? There are some links in the article pointing to generic MSDN help articles but I'm not seeing this claim substantiated by data (reports of downtime for example).
No one is saying online services and cloud providers aren't affected by the surge in traffic but I'm not seeing where the cracks are, so I'm calling the title BS (or clickbait at best).
Hate on M$ all you want, the worlds' enterprises run on their software (and on their cloud).
"It’s safe to say that AWS runs a much bigger, if not more critical, chunk of the digital world than Azure" noooo... it isn't. It's just that the people we know use AWS. Most people use office, sharepoint, outlook and exchange - especially exchange.
I think its safe to say that more consumer services are using AWS than Azure, things like Spotify, Netflix, Snapchat, etc. But when it comes to large enterprises, I'd bet that Azure is first place by a significant margin.
Given that these enterprises power a significant chunk of the worlds supply chains, manufacturing processes and so on, I'd say that the piece of the digital world that Azure offers is way more critical.
Amazon always brags that they have twice as many Windows VMs running as Azure.
I’m by no means a Windows hater. I’ve never developed on anything professionally besides a Windows computer - these days .Net Core, Python and Node. But once I started working with AWS, I avoided deploying to Windows servers like the plague. Anytime you add Windows to the mix you basically more than double your cost between Windows licenses and the increased hardware required over running Linux.
All that to say, no one chooses to use Windows on AWS. So if there are that many Windows instances on AWS, they are more than likely old enterprise workloads.
Also, don't understimate the amount of old, unsupported or ancient hardware that is still churning along in factories and supply companies keeping the supply chain running.
It's terrifying how much of the world is held together by (literal sometimes) duct tape.
I have had a Service Down Ticket open with Azure now for four days with out as much as a call back from them. Just today I finally got the automated email saying they acknowledge my issue. My system is totally dead in the water on Azure.
On the other hand amazon just called me to make sure everything was ok. Their system has been a rock.
The entirety of French universities are using Microsoft Teams for their online classes, that's a very serious workload. Previously almost no one used that tool.
One thing that probably helps AWS is that they have experience preparing for peak usage given Amazon's peak traffic (and others) on some events during the year, such as black Friday, Prime Day, major holidays, etc.
Eh. A legion of non remote, non technology, companies were forced to quickly thrust themselves into the web. Want to bet how many of those companies were already running as a Microsoft shop? Nearly 100%.
I would love to see details on the total requests placed on AWS and Azure during the time frame of COVID. My instinct tells me that Azure got hammered driven by those microsoft shops scrambling to get into the cloud while AWS had only a small increase, driven by technology companies (a small portion of companies) where the devs have final say and prefer AWS.
100% echo this statement. I have dozens of customers who went to VDI in Azure as their immediate work from home solution. I can't think of a single one that went to AWS. The article makes a bunch of baseless claims to get to an unproven conclusion.
>It’s safe to say that AWS runs a much bigger, if not more critical, chunk of the digital world than Azure.
Based on?? If o365 goes down for any extended period of time, most of the fortune 100 can't do business.
> it generally lacks operational experience in running “always on” services and preparing for unanticipated traffic surges.
Ya... Microsoft lacks experience running always-on services... they'd only been managing hotmail for a decade before AWS was even in beta. They have no experience running always-on services AT ALL.
Wait until they get the bill :) I will say, it's amazing how quickly they find budget for that back-burnered VDI project after the first month of billing.
MS has more user facing SaaS offerings than Amazon, so both are not impacted the same way due to the lock down. What is the AWS equivalent of Teams & Office 365? Chime and work docs? The comparison is apples to oranges.
When we started using EMR, we'd never exhaust the number of available instances (in both spot and on-demand). Now, we frequently do. And, for the last 3 weeks, it's been close to impossible to launch a huge cluster.
Uh, as a former Microsoft equivalent of a SRE this is a utterly ridiculous statement.
I was in the Windows division on a team handling all the incoming telemetry - we definitely knew how to prepare for traffic surges and keep our services "always on".
Has the author never seen Office 365? Bing? The entirety of Azure in general that has dozens if not hundreds of publicly facing always on services?