The Windows Shutdown crapfest

dchest · on Sept 19, 2010

(2006)

adambyrtek · on Sept 20, 2010

Sorry, I forgot to mention the date in the title :(

emehrkay · on Sept 19, 2010

The repository structure seems a bit bad in that it takes months for changes to propagate, but how else should a project as big as windows be handled at the source-level?

btilly · on Sept 20, 2010

For an alternative watch http://www.youtube.com/watch?v=sMql3Di4Kgc to see how Google does it.

Basically most of the company develops on one branch, but no changelist can be checked in without going through peer review. (The talk is about the tool that Google uses for peer review.) It seems to work quite well.

Now you can ask how the project size compares. Well http://en.wikipedia.org/wiki/Source_lines_of_code#Example claims that Windows Server 2003 had 50 million lines of code. http://code.google.com/opensource/ claims that Google has released over 15 million lines of code across various projects. I guarantee that Google has not released most of its code. Based on those data points I wouldn't be surprised if the sizes were comparable, at least to within an order of magnitude.

(However Google's source code is probably significantly better. Google has not been around as long, and does not have the same kinds of accumulated legacy issues that Microsoft does.)

Random disclaimer, I work for Google and I like how Google does it.

macrael · on Sept 20, 2010

Is it really fair to compare the amount of code in windows server to the total amount of code that google maintains? Google has many different software projects, what is the biggest? Search? Android? Gmail? Compare one project to a server operating system and my guess is that the OS is going to be an order of magnitude larger. An OS is an enormously complicated piece of software with lots of co-dependent parts. That's what led the to this pretty horribly complicated source management system.

wtallis · on Sept 20, 2010

Google's stuff may be a lot more modular than Windows, but that's not to say that there aren't inter-dependencies. Google's web services all share just a few back-end technologies (MapReduce, GFS, BigTable) that are comparable to an OS kernel, with things like GMail and Docs hanging off like subsystems. But even things like Gmail, Docs, Calender, Search, etc. aren't even close to independent. The GMail people need to collaborate with the Docs people to handle attachments and the Calendar people to handle to-do lists and event invitations and the GTalk and Voice people to implement the voice and video chat and phone calling. The Picasa people need to collaborate with the the Google Earth and Maps people to handle geotagging, and the GWT (probably) people to deal with the web interface. The Android people need to collaborate with everybody who works on the front-ends of all the other products in order to develop mobile versions or integrate them more deeply in to the OS.

Even if the "collaboration" in many of these instances is mostly limited to using an API published by the other group, there's still the need to make sure that those APIs are sufficient to accomplish the task at hand, and obviously Google's UI consistency between products is more than an accident. I think that to the extent that Microsoft's code is more tightly coupled than Google's or that of a desktop Linux distribution, it's mostly their own fault.

joshu · on Sept 20, 2010

android is kept in git, not the main archive.

kenjackson · on Sept 20, 2010

That really would not work very well for building large apps, IMO.

I'm sure Windows does code reviews for checkins to all branches. That's not sufficient, unless you have an oracle that can tell you if each checkin will work just fine.

One big problem I see:

When you're working on a feature you don't want a thousand of checkins per day coming in. As some checkin downstream, that doesn't directly effect your feature can end up blocking you. For example, I'm working on the start menu button, but some random guy checks in some new feature that broke the ATI Radeon 9850x video card, which happens to be the video card you're using. Now I'm blocked for a day.

btilly · on Sept 20, 2010

That really would not work very well for building large apps, IMO.

You are hardly alone in that opinion. However you've probably never seen what happens in practice. I have. And I value that relevant experience more than your opinion.

I'm sure Windows does code reviews for checkins to all branches. That's not sufficient, unless you have an oracle that can tell you if each checkin will work just fine.

I don't know what Microsoft's code review policy is. It would surprise me if they forced code review before allowing developers to check code in. It would fit my expectations if they do code review before merging branches together. But those are educated guesses based on the industry as a whole, and could both be far off base.

That said, I agree that manual code review is not sufficient. Nor are automated unit tests. Both help and are important pieces, but you need other pieces as well to make this work.

One big problem I see:

When you're working on a feature you don't want a thousand of checkins per day coming in. As some checkin downstream, that doesn't directly effect your feature can end up blocking you. For example, I'm working on the start menu button, but some random guy checks in some new feature that broke the ATI Radeon 9850x video card, which happens to be the video card you're using. Now I'm blocked for a day.

Why would you think that you are blocked for a day? Implicit in there is an assumption that there is a nightly build process, giving you a one day feedback loop. In that case if the nightly build is broken, you're in trouble. And with a day of check-ins, this is likely to happen a lot of days.

To that assumption I'll reply that if you wish to use an iterative development process, you need to make iterations and feedback as quick as possible. See http://martinfowler.com/articles/continuousIntegration.html for some advice on how to do that. Also look to your build tools. For instance something like http://code.google.com/p/distcc/ can let your builds go MUCH faster, which means that the gap between "oops" and "all better" can be much shorter.

Is this all simple to set up? No. Not at all. But it is doable, and I believe it is very much worthwhile in practice.

kenjackson · on Sept 20, 2010

I've verified with some people I know who work on Windows. _Every_ checkin requires code review, except those in the individual dev branch (if the dev opts to create one). Merging also requires code review and additional suites of testing each level up the tree one merges.

"Why would you think that you are blocked for a day? Implicit in there is an assumption that there is a nightly build process, giving you a one day feedback loop"

Actually, I wasn't assuming a nightly build process. I was thinking the time it would take for me to realize that the bug was actually in the new code, then for the dev to get a fix, have it code reviewed, have tests written, and then check it in. And given that maybe 1% of the users broke due to it, they want to be careful with a fix that they don't fix me, but break a different 5% of users who are using the NVidia GeForce 9831. If you're working in one branch, my expectation is that you can pick up the fix as soon as it is checked in, but the fix to that is minbar one day in most cases.

My point is that when you have a thousand people doing checkins per day. The odds that some one breaks goes up a fair bit. Let me put it another way. The single branch is the simplest way to write code. The reason that so many places don't do it isn't because they like to just make things complicated, its because they've all had things go bad with the other system.

With Google, my prediction is that Android/Chrome will eventually move to a system where there are branches for certain parts of the product. I think the rest of Google can probably work out of a single branch.

btilly · on Sept 21, 2010

I've verified with some people I know who work on Windows. _Every_ checkin requires code review, except those in the individual dev branch (if the dev opts to create one). Merging also requires code review and additional suites of testing each level up the tree one merges.

Thank you for checking in to this. I'll try to remember that data point.

"Why would you think that you are blocked for a day? Implicit in there is an assumption that there is a nightly build process, giving you a one day feedback loop"

Actually, I wasn't assuming a nightly build process. I was thinking the time it would take for me to realize that the bug was actually in the new code, then for the dev to get a fix, have it code reviewed, have tests written, and then check it in. And given that maybe 1% of the users broke due to it, they want to be careful with a fix that they don't fix me, but break a different 5% of users who are using the NVidia GeForce 9831. If you're working in one branch, my expectation is that you can pick up the fix as soon as it is checked in, but the fix to that is minbar one day in most cases.

This is what continuous builds and regression tests are for. You can automatically identify which check-in caused something to break. If it broke something, anywhere, that check-in needs to be rolled back. No actual thought is required. The code without this check-in worked, with this check-in failed, we have a known good state to revert to. Debug at leisure, but DON'T leave it broken for everyone else in the meantime.

My point is that when you have a thousand people doing checkins per day. The odds that some one breaks goes up a fair bit. Let me put it another way. The single branch is the simplest way to write code. The reason that so many places don't do it isn't because they like to just make things complicated, its because they've all had things go bad with the other system.

How many people do you think Google has checking code in? I'm not allowed to tell you, but I'm curious what you think it is.

Google has a lot of people who have a lot of experience in lots of companies of all sizes. We didn't settle on our current system out of ignorance of how other large projects work. But I'll completely agree that if you try to naively scale up smaller development processes with one branch, you'll fall over unless you address a bunch of other problems. Most people at Google believe that we have done so.

With Google, my prediction is that Android/Chrome will eventually move to a system where there are branches for certain parts of the product. I think the rest of Google can probably work out of a single branch.

My understanding is that Android and Chrome are already both are in separate repositories so that they can interact better with outside groups. (Both are open source projects with outside developers.) Given that Android is in git, I am sure that it has a ton of branches.

That said, the rest of Google is a much more complex product which is under very rapid development.

kenjackson · on Sept 21, 2010

"This is what continuous builds and regression tests are for. You can automatically identify which check-in caused something to break."

That's only if you already have a test for it. The first time a checkin causes my machine to break, I don't know what caused it -- especially since there were 1,000 checkins since the last time I synced. I'm not even sure if it is a code change that messed up my system in some cases.

Once you have the regression test checked-in, that's great. But that's a small set of cases.

"How many people do you think Google has checking code in?"

Maybe a couple of thousand checkins per day? Not sure how many users are doing it though.

"That said, the rest of Google is a much more complex product which is under very rapid development."

The reason I called out Android and Chrome is that I think they're fundamentally different than the rest of Google and much more like Windows. Why? Several reasons:

1) Cloud services are in some ways easier to test. You can test them on the cloud configuration they get deployed to. There is no single configuration to test Windows on. There's not ten configuration. There's not 1000 configurations. That new SSD HD, that new video card, audio card, CPU, chipset, tuner card, and combination thereof, are places where things can go wrong that you won't catch in the lab.

2) Building on (1) is the fact that Windows/Android/Chrome are layered systems. If something breaks in the bottom layer, everyone above it is screwed. While I'm sure Google has some of that, like I would imagine a lot sits on GFS, the fact that it can be tested really well in isolation on a known configuration makes it a MUCH more tractable problem.

But the guy who makes a breaking change in the SSD HD layer just screwed up the day for a chunk of people. Now the key here isn't that he broke everybody. That would certainly get caught in testing. The problem is that he broke 1% of the company or fewer. And someone else had a checkin that broke another 1% or fewer. And someone else had a different checkin that broke a different 1%. Given the number of checkins, you suddenly have 10% of the company who is trying to figure out what broke what.

Android and Chrome would have the same problem, but at a smaller scale. The core of Google is probably pretty immune to this, as is Bing and the Bing Services team.

I just don't think you can a development model for cloud based web services and use it for a desktop product like Windows.

btilly · on Sept 22, 2010

"This is what continuous builds and regression tests are for. You can automatically identify which check-in caused something to break."

That's only if you already have a test for it. The first time a checkin causes my machine to break, I don't know what caused it -- especially since there were 1,000 checkins since the last time I synced. I'm not even sure if it is a code change that messed up my system in some cases.

What you say is true. But thanks to the Pareto principle, you find in practice that with well-factored software you need fewer tests than you'd imagine.

Once you have the regression test checked-in, that's great. But that's a small set of cases.

There are always cases nobody has caught yet. But they take up much less time than you'd imagine.

"How many people do you think Google has checking code in?"

Maybe a couple of thousand checkins per day? Not sure how many users are doing it though.

Sorry, but I just giggled.

"That said, the rest of Google is a much more complex product which is under very rapid development."

The reason I called out Android and Chrome is that I think they're fundamentally different than the rest of Google and much more like Windows. Why? Several reasons:

1) Cloud services are in some ways easier to test. You can test them on the cloud configuration they get deployed to. There is no single configuration to test Windows on. There's not ten configuration. There's not 1000 configurations. That new SSD HD, that new video card, audio card, CPU, chipset, tuner card, and combination thereof, are places where things can go wrong that you won't catch in the lab.

If you have well-factored abstraction layers, each of which has good unit tests for the abstractions it expects from below and for the abstractions it promises to export to above, the likelyhood of random combinations breaking things goes way down. Sure, a random dangling pointer from one driver can scribble over random memory for anywhere else, but there are techniques for catching that type of mistake.

Also cloud services have similar challenges. There are a lot of possible combinations when you release software in 40 languages which needs to support a wide variety of browsers on multiple operating systems with different configurations to users who have different preferences enabled. It is not as bad as the number of combinations people run into with hardware, but it isn't a trivial challenge either.

2) Building on (1) is the fact that Windows/Android/Chrome are layered systems. If something breaks in the bottom layer, everyone above it is screwed. While I'm sure Google has some of that, like I would imagine a lot sits on GFS, the fact that it can be tested really well in isolation on a known configuration makes it a MUCH more tractable problem.

SOME of that? I just giggled again.

But the guy who makes a breaking change in the SSD HD layer just screwed up the day for a chunk of people. Now the key here isn't that he broke everybody. That would certainly get caught in testing. The problem is that he broke 1% of the company or fewer. And someone else had a checkin that broke another 1% or fewer. And someone else had a different checkin that broke a different 1%. Given the number of checkins, you suddenly have 10% of the company who is trying to figure out what broke what.

Yes, I know the theoretical nightmare. In practice it doesn't seem to be a problem.

In fact the smaller and more rapid the integration cycle is, the less overall pain there seems to be associated with them. It isn't that you do 100 integrations with 1% of the pain each time. It is that you do 100 integrations with 0.1% of the pain each time, which means your overall integration pain is a tenth of what it used to be.

Is there still integration pain? Of course there is. It is just less compared to what you are doing.

Android and Chrome would have the same problem, but at a smaller scale. The core of Google is probably pretty immune to this, as is Bing and the Bing Services team.

I strongly believe that you are over-estimating the natural immunity that Google has to this type of problem. I cannot judge how Bing compares to Google without knowing things that I don't about how Bing is organized internally.

I just don't think you can a development model for cloud based web services and use it for a desktop product like Windows.

For an interesting public example, look at the LLVM project. They produce critical software that OS X is dependent on, that operates in layers where each piece depends on the ones below it, that is multi-platform, and do it with developing on one branch with rapid integration.

Admittedly they are much, much smaller than Windows or Google. But they give a public data point of interest.

kenjackson · on Sept 22, 2010

"But thanks to the Pareto principle, you find in practice that with well-factored software you need fewer tests than you'd imagine."

Thanks to the "Devs are Human" principle, you never have the test coverage you thought you'd have. I've never worked on a project where we've thought, "Wow, turns out that we needed fewer tests than we thought."

"If you have well-factored abstraction layers, each of which has good unit tests for the abstractions it expects from below and for the abstractions it promises to export to above, the likelyhood of random combinations breaking things goes way down. Sure, a random dangling pointer from one driver can scribble over random memory for anywhere else, but there are techniques for catching that type of mistake."

Well-factored abstraction layers do help. That's the only way you write 100 million line applications that run on over a billion computers with over 100,000 unique configurations. The problem is that if you have the size of dev teams you have on Windows (or presumably at Google), if a dev makes a mistake once every five years, that takes half a day to resolve, the checked-in version is ALWAYS broken. Even with great testing, abstraction layers, etc... mistakes will happen -- sometimes for a given dev even more often than once every five years.

"In fact the smaller and more rapid the integration cycle is, the less overall pain there seems to be associated with them. It isn't that you do 100 integrations with 1% of the pain each time. It is that you do 100 integrations with 0.1% of the pain each time, which means your overall integration pain is a tenth of what it used to be."

The only thing you've done here is required me to integrate 10x more often. Checking in and requiring syncing after you write each line of code doesn't fix the issue, it just makes the process a lot more tedious as we hit our eventual problem.

"I strongly believe that you are over-estimating the natural immunity that Google has to this type of problem."

Why wouldn't cloud-based services be immune to it? Your millions of lines of code run over a small set of easily testable configurations. And based on what I've seen working on other, admittedly smaller scale cloud services, this is exactly the case. It's just an easier problem. There's nothing wrong with that. In fact its a great selling point..

"For an interesting public example, look at the LLVM project. They produce critical software that OS X is dependent on, that operates in layers where each piece depends on the ones below it, that is multi-platform, and do it with developing on one branch with rapid integration."

But LLVM is a product that is largely isolated from the system, right? It reads in input from a very standard source and writes out output. gcc, Visual C++, and the Intel compiler are probably all in one branch systems.

Linux is probably a better example, as they don't sit on an abstraction layer so much as they are the abstraction layer. Is that one branch for a whole distribution? Probably not, for at least the obvious reason that many parts of a distribution aren't even created by the Linux team.

emehrkay · on Sept 20, 2010

Thanks, I'll check that video out.

city41 · on Sept 20, 2010

VS2010 was the largest source repository I've ever worked on. It is structured very similar to Windows, all large projects at MS are as far as I know. It ended up being my job was almost entirely dedicated to just maintaining our branches: pushing our updates up to Main, and pulling in fresh changes about once a week. DevDiv has a term for this: BFD, "Branch Facilitator Developer", which for most teams is a full time job of nothing but merging code. For me it was about 70% of my job. Each branch took about 120 gigs of space on disk, and I regularly worked with about 5 branches. Merging our changes into main was a huge ordeal. We had to merge, do a full stack build (which took about 24 hours), and run every single team's tests, which took another 24 to 48 hours. From there it was a bookkeeping task of checking in on all the tests that failed and resolving them, doing performance testing, etc etc. All in all merging to main took roughly a week to pull off. The reverse was always interesting: pulling in new bits from Main to our branch was like playing Russian roulette, what was going to break this time?

To add a little spice to all of this, it all took place on alpha builds of Team Foundation Server that was being developed at the same time as everything else. Fun times :)

eitally · on Sept 19, 2010

Well, creating a common vision would have been a good start, as would having had a single owner. There is a point where it becomes more valuable for managers to delegate and step away than it is for them to try and keep abreast of the minutiae. In order for delegation to work there must be comprehensive trust. In my experience, large organizations usually fail miserably at this combination of trust, vision, and delegation [at some point of scaling]. The better you are the better you can scale, but even revered learning organizations like 3M, Bell Labs, and Xerox PARC relied [mostly] on small, self-organizing organic project teams with very little management oversight... but oodles of charisma & responsibility.

adambyrtek · on Sept 20, 2010

Thinking about that, looks like this is a perfect example to prove why continuous integration is such a great idea.

philwelch · on Sept 20, 2010

I love this article. It's practically a case study for Fred Brooks' "mythical man-month".

rbanffy · on Sept 19, 2010

Again?

adambyrtek · on Sept 19, 2010

Interesting. I couldn't google the previous entry, Hacker News didn't detect that this as a duplicate as well.

http://www.google.com/search?&q=site%3Anews.ycombinator....

_delirium · on Sept 19, 2010

This does look like the first submission, unless SearchYC is missing it: http://searchyc.com/submissions/moishelettvin.blogspot.com

code_duck · on Sept 20, 2010

I think it's Reddit and Slashdot where I saw the the first three times.

some1else · on Sept 20, 2010

I just used this in a metaphor today, to explain why Windows is inferior to Mac OS and Linux Gnome

DiabloD3 · on Sept 19, 2010

Ahh, this is why I love Linux and XFCE.... XFCE didn't take a year to write such a dialog box.

wccrawford · on Sept 19, 2010

Any time I hear someone say they spent a year working on a few hundred lines of code, I have to wonder how much of the problem was other people, and how much was that person.

I suspect there's a ton of exaggeration going on here, and that's probably not the only code they worked on. It's also probably not the highest priority, either.

dugmartin · on Sept 20, 2010

I worked on an equivalently small feature on Longhorn/Vista and it took over 18 months for it to be completed. It also depended on shell code and it would take weeks for code to migrate up their tree and back down to us.

The fun thing is that two dll's that I built for the feature ended up being standalone dlls in Windows\System32. I thought someone would merge that code into some common dll but I guess that wasn't on the schedule.

dasil003 · on Sept 19, 2010

Any time? Even when the logistics of the situation are explained this clearly, and the coder is responsible for 2% of the decision making process?

I really don't see where in this article there's room to chalk the dysfunction up to exaggeration or incompetence.

lumisura · on Sept 20, 2010

I'm not even going to tell you how long it took and how complex it was to replace one single icon sometimes in Vista. I worked in the Windows UX team at the time and tried really hard during Windows 7 to improve that process, because there were so many people involved and so many hours spent for such simple things. So yes, I believe what this guys is saying about his feature is possible - to swap an icon we sometimes had to track a ridiculous number of people, wait several builds to see it, and get approval from a room full of people many times several management levels above me.

2arrs2ells · on Sept 20, 2010

I fought similar battles (but at a much smaller scale) trying to change some of inane dialog box text/tooltips in InfoPath for a feature I owned as a PM. I was an intern, but the pushback was rarely along the lines of "Your version isn't better" but rather "We can't handle the globalization/internationalization/testing/etc burden of making the change."

lumisura · on Sept 20, 2010

Yup. Which is even more frustrating. Many times you got everybody to agree it would be better the way you are trying to do, but for whatever reasons that seem to be out of control of every person involved in the process, it can't be done. It's sad actually. A company full of people with the right ideas and skills that can't put those ideas in practice because the operation and processes are getting in their way.