That really would not work very well for building large apps, IMO.
You are hardly alone in that opinion. However you've probably never seen what happens in practice. I have. And I value that relevant experience more than your opinion.
I'm sure Windows does code reviews for checkins to all branches. That's not sufficient, unless you have an oracle that can tell you if each checkin will work just fine.
I don't know what Microsoft's code review policy is. It would surprise me if they forced code review before allowing developers to check code in. It would fit my expectations if they do code review before merging branches together. But those are educated guesses based on the industry as a whole, and could both be far off base.
That said, I agree that manual code review is not sufficient. Nor are automated unit tests. Both help and are important pieces, but you need other pieces as well to make this work.
One big problem I see:
When you're working on a feature you don't want a thousand of checkins per day coming in. As some checkin downstream, that doesn't directly effect your feature can end up blocking you. For example, I'm working on the start menu button, but some random guy checks in some new feature that broke the ATI Radeon 9850x video card, which happens to be the video card you're using. Now I'm blocked for a day.
Why would you think that you are blocked for a day? Implicit in there is an assumption that there is a nightly build process, giving you a one day feedback loop. In that case if the nightly build is broken, you're in trouble. And with a day of check-ins, this is likely to happen a lot of days.
To that assumption I'll reply that if you wish to use an iterative development process, you need to make iterations and feedback as quick as possible. See http://martinfowler.com/articles/continuousIntegration.html for some advice on how to do that. Also look to your build tools. For instance something like http://code.google.com/p/distcc/ can let your builds go MUCH faster, which means that the gap between "oops" and "all better" can be much shorter.
Is this all simple to set up? No. Not at all. But it is doable, and I believe it is very much worthwhile in practice.
I've verified with some people I know who work on Windows. _Every_ checkin requires code review, except those in the individual dev branch (if the dev opts to create one). Merging also requires code review and additional suites of testing each level up the tree one merges.
"Why would you think that you are blocked for a day? Implicit in there is an assumption that there is a nightly build process, giving you a one day feedback loop"
Actually, I wasn't assuming a nightly build process. I was thinking the time it would take for me to realize that the bug was actually in the new code, then for the dev to get a fix, have it code reviewed, have tests written, and then check it in. And given that maybe 1% of the users broke due to it, they want to be careful with a fix that they don't fix me, but break a different 5% of users who are using the NVidia GeForce 9831. If you're working in one branch, my expectation is that you can pick up the fix as soon as it is checked in, but the fix to that is minbar one day in most cases.
My point is that when you have a thousand people doing checkins per day. The odds that some one breaks goes up a fair bit. Let me put it another way. The single branch is the simplest way to write code. The reason that so many places don't do it isn't because they like to just make things complicated, its because they've all had things go bad with the other system.
With Google, my prediction is that Android/Chrome will eventually move to a system where there are branches for certain parts of the product. I think the rest of Google can probably work out of a single branch.
I've verified with some people I know who work on Windows. _Every_ checkin requires code review, except those in the individual dev branch (if the dev opts to create one). Merging also requires code review and additional suites of testing each level up the tree one merges.
Thank you for checking in to this. I'll try to remember that data point.
"Why would you think that you are blocked for a day? Implicit in there is an assumption that there is a nightly build process, giving you a one day feedback loop"
Actually, I wasn't assuming a nightly build process. I was thinking the time it would take for me to realize that the bug was actually in the new code, then for the dev to get a fix, have it code reviewed, have tests written, and then check it in. And given that maybe 1% of the users broke due to it, they want to be careful with a fix that they don't fix me, but break a different 5% of users who are using the NVidia GeForce 9831. If you're working in one branch, my expectation is that you can pick up the fix as soon as it is checked in, but the fix to that is minbar one day in most cases.
This is what continuous builds and regression tests are for. You can automatically identify which check-in caused something to break. If it broke something, anywhere, that check-in needs to be rolled back. No actual thought is required. The code without this check-in worked, with this check-in failed, we have a known good state to revert to. Debug at leisure, but DON'T leave it broken for everyone else in the meantime.
My point is that when you have a thousand people doing checkins per day. The odds that some one breaks goes up a fair bit. Let me put it another way. The single branch is the simplest way to write code. The reason that so many places don't do it isn't because they like to just make things complicated, its because they've all had things go bad with the other system.
How many people do you think Google has checking code in? I'm not allowed to tell you, but I'm curious what you think it is.
Google has a lot of people who have a lot of experience in lots of companies of all sizes. We didn't settle on our current system out of ignorance of how other large projects work. But I'll completely agree that if you try to naively scale up smaller development processes with one branch, you'll fall over unless you address a bunch of other problems. Most people at Google believe that we have done so.
With Google, my prediction is that Android/Chrome will eventually move to a system where there are branches for certain parts of the product. I think the rest of Google can probably work out of a single branch.
My understanding is that Android and Chrome are already both are in separate repositories so that they can interact better with outside groups. (Both are open source projects with outside developers.) Given that Android is in git, I am sure that it has a ton of branches.
That said, the rest of Google is a much more complex product which is under very rapid development.
"This is what continuous builds and regression tests are for. You can automatically identify which check-in caused something to break."
That's only if you already have a test for it. The first time a checkin causes my machine to break, I don't know what caused it -- especially since there were 1,000 checkins since the last time I synced. I'm not even sure if it is a code change that messed up my system in some cases.
Once you have the regression test checked-in, that's great. But that's a small set of cases.
"How many people do you think Google has checking code in?"
Maybe a couple of thousand checkins per day? Not sure how many users are doing it though.
"That said, the rest of Google is a much more complex product which is under very rapid development."
The reason I called out Android and Chrome is that I think they're fundamentally different than the rest of Google and much more like Windows. Why? Several reasons:
1) Cloud services are in some ways easier to test. You can test them on the cloud configuration they get deployed to. There is no single configuration to test Windows on. There's not ten configuration. There's not 1000 configurations. That new SSD HD, that new video card, audio card, CPU, chipset, tuner card, and combination thereof, are places where things can go wrong that you won't catch in the lab.
2) Building on (1) is the fact that Windows/Android/Chrome are layered systems. If something breaks in the bottom layer, everyone above it is screwed. While I'm sure Google has some of that, like I would imagine a lot sits on GFS, the fact that it can be tested really well in isolation on a known configuration makes it a MUCH more tractable problem.
But the guy who makes a breaking change in the SSD HD layer just screwed up the day for a chunk of people. Now the key here isn't that he broke everybody. That would certainly get caught in testing. The problem is that he broke 1% of the company or fewer. And someone else had a checkin that broke another 1% or fewer. And someone else had a different checkin that broke a different 1%. Given the number of checkins, you suddenly have 10% of the company who is trying to figure out what broke what.
Android and Chrome would have the same problem, but at a smaller scale. The core of Google is probably pretty immune to this, as is Bing and the Bing Services team.
I just don't think you can a development model for cloud based web services and use it for a desktop product like Windows.
"This is what continuous builds and regression tests are for. You can automatically identify which check-in caused something to break."
That's only if you already have a test for it. The first time a checkin causes my machine to break, I don't know what caused it -- especially since there were 1,000 checkins since the last time I synced. I'm not even sure if it is a code change that messed up my system in some cases.
What you say is true. But thanks to the Pareto principle, you find in practice that with well-factored software you need fewer tests than you'd imagine.
Once you have the regression test checked-in, that's great. But that's a small set of cases.
There are always cases nobody has caught yet. But they take up much less time than you'd imagine.
"How many people do you think Google has checking code in?"
Maybe a couple of thousand checkins per day? Not sure how many users are doing it though.
Sorry, but I just giggled.
"That said, the rest of Google is a much more complex product which is under very rapid development."
The reason I called out Android and Chrome is that I think they're fundamentally different than the rest of Google and much more like Windows. Why? Several reasons:
1) Cloud services are in some ways easier to test. You can test them on the cloud configuration they get deployed to. There is no single configuration to test Windows on. There's not ten configuration. There's not 1000 configurations. That new SSD HD, that new video card, audio card, CPU, chipset, tuner card, and combination thereof, are places where things can go wrong that you won't catch in the lab.
If you have well-factored abstraction layers, each of which has good unit tests for the abstractions it expects from below and for the abstractions it promises to export to above, the likelyhood of random combinations breaking things goes way down. Sure, a random dangling pointer from one driver can scribble over random memory for anywhere else, but there are techniques for catching that type of mistake.
Also cloud services have similar challenges. There are a lot of possible combinations when you release software in 40 languages which needs to support a wide variety of browsers on multiple operating systems with different configurations to users who have different preferences enabled. It is not as bad as the number of combinations people run into with hardware, but it isn't a trivial challenge either.
2) Building on (1) is the fact that Windows/Android/Chrome are layered systems. If something breaks in the bottom layer, everyone above it is screwed. While I'm sure Google has some of that, like I would imagine a lot sits on GFS, the fact that it can be tested really well in isolation on a known configuration makes it a MUCH more tractable problem.
SOME of that? I just giggled again.
But the guy who makes a breaking change in the SSD HD layer just screwed up the day for a chunk of people. Now the key here isn't that he broke everybody. That would certainly get caught in testing. The problem is that he broke 1% of the company or fewer. And someone else had a checkin that broke another 1% or fewer. And someone else had a different checkin that broke a different 1%. Given the number of checkins, you suddenly have 10% of the company who is trying to figure out what broke what.
Yes, I know the theoretical nightmare. In practice it doesn't seem to be a problem.
In fact the smaller and more rapid the integration cycle is, the less overall pain there seems to be associated with them. It isn't that you do 100 integrations with 1% of the pain each time. It is that you do 100 integrations with 0.1% of the pain each time, which means your overall integration pain is a tenth of what it used to be.
Is there still integration pain? Of course there is. It is just less compared to what you are doing.
Android and Chrome would have the same problem, but at a smaller scale. The core of Google is probably pretty immune to this, as is Bing and the Bing Services team.
I strongly believe that you are over-estimating the natural immunity that Google has to this type of problem. I cannot judge how Bing compares to Google without knowing things that I don't about how Bing is organized internally.
I just don't think you can a development model for cloud based web services and use it for a desktop product like Windows.
For an interesting public example, look at the LLVM project. They produce critical software that OS X is dependent on, that operates in layers where each piece depends on the ones below it, that is multi-platform, and do it with developing on one branch with rapid integration.
Admittedly they are much, much smaller than Windows or Google. But they give a public data point of interest.
"But thanks to the Pareto principle, you find in practice that with well-factored software you need fewer tests than you'd imagine."
Thanks to the "Devs are Human" principle, you never have the test coverage you thought you'd have. I've never worked on a project where we've thought, "Wow, turns out that we needed fewer tests than we thought."
"If you have well-factored abstraction layers, each of which has good unit tests for the abstractions it expects from below and for the abstractions it promises to export to above, the likelyhood of random combinations breaking things goes way down. Sure, a random dangling pointer from one driver can scribble over random memory for anywhere else, but there are techniques for catching that type of mistake."
Well-factored abstraction layers do help. That's the only way you write 100 million line applications that run on over a billion computers with over 100,000 unique configurations. The problem is that if you have the size of dev teams you have on Windows (or presumably at Google), if a dev makes a mistake once every five years, that takes half a day to resolve, the checked-in version is ALWAYS broken. Even with great testing, abstraction layers, etc... mistakes will happen -- sometimes for a given dev even more often than once every five years.
"In fact the smaller and more rapid the integration cycle is, the less overall pain there seems to be associated with them. It isn't that you do 100 integrations with 1% of the pain each time. It is that you do 100 integrations with 0.1% of the pain each time, which means your overall integration pain is a tenth of what it used to be."
The only thing you've done here is required me to integrate 10x more often. Checking in and requiring syncing after you write each line of code doesn't fix the issue, it just makes the process a lot more tedious as we hit our eventual problem.
"I strongly believe that you are over-estimating the natural immunity that Google has to this type of problem."
Why wouldn't cloud-based services be immune to it? Your millions of lines of code run over a small set of easily testable configurations. And based on what I've seen working on other, admittedly smaller scale cloud services, this is exactly the case. It's just an easier problem. There's nothing wrong with that. In fact its a great selling point..
"For an interesting public example, look at the LLVM project. They produce critical software that OS X is dependent on, that operates in layers where each piece depends on the ones below it, that is multi-platform, and do it with developing on one branch with rapid integration."
But LLVM is a product that is largely isolated from the system, right? It reads in input from a very standard source and writes out output. gcc, Visual C++, and the Intel compiler are probably all in one branch systems.
Linux is probably a better example, as they don't sit on an abstraction layer so much as they are the abstraction layer. Is that one branch for a whole distribution? Probably not, for at least the obvious reason that many parts of a distribution aren't even created by the Linux team.
You are hardly alone in that opinion. However you've probably never seen what happens in practice. I have. And I value that relevant experience more than your opinion.
I'm sure Windows does code reviews for checkins to all branches. That's not sufficient, unless you have an oracle that can tell you if each checkin will work just fine.
I don't know what Microsoft's code review policy is. It would surprise me if they forced code review before allowing developers to check code in. It would fit my expectations if they do code review before merging branches together. But those are educated guesses based on the industry as a whole, and could both be far off base.
That said, I agree that manual code review is not sufficient. Nor are automated unit tests. Both help and are important pieces, but you need other pieces as well to make this work.
One big problem I see:
When you're working on a feature you don't want a thousand of checkins per day coming in. As some checkin downstream, that doesn't directly effect your feature can end up blocking you. For example, I'm working on the start menu button, but some random guy checks in some new feature that broke the ATI Radeon 9850x video card, which happens to be the video card you're using. Now I'm blocked for a day.
Why would you think that you are blocked for a day? Implicit in there is an assumption that there is a nightly build process, giving you a one day feedback loop. In that case if the nightly build is broken, you're in trouble. And with a day of check-ins, this is likely to happen a lot of days.
To that assumption I'll reply that if you wish to use an iterative development process, you need to make iterations and feedback as quick as possible. See http://martinfowler.com/articles/continuousIntegration.html for some advice on how to do that. Also look to your build tools. For instance something like http://code.google.com/p/distcc/ can let your builds go MUCH faster, which means that the gap between "oops" and "all better" can be much shorter.
Is this all simple to set up? No. Not at all. But it is doable, and I believe it is very much worthwhile in practice.