One thing that amazes me at most of commercial codebases I have seen (leaked or not) is the sheer size of them. How can one spend 1GB of source code on something like anti-virus program?
For the large code bases I've seen, the size is often taken up by Third party code.
If we depend on a third party library we'll down load the source and build it for all supported targets( x86 and x64) (debug and release) (windows and linux). This can really increase the size of your code base quickly.
Static and dynamic libraries for third-party code and imports from external repositories.
Code that's been around for a long time, with a team of developers working on it for a decade or more, tends to be big. The way it is with commercial software, you more or less have to keep adding features to compete; and it's dangerous to refactor much to eliminate code because of the risk of breaking backwards compatibility, so that introduces another form of duplication.
1GB of source, just source, is pretty big, though. Just the source from RAD Studio (the product I work on) is nearly 10 million lines, about 350 million characters - though that doesn't include the C++ compiler, C RTL, debugger kernel, and a bunch of other things.
It's not limited to commercial code... Lots of codebases for complex applications are substantially large. Often the translation files filled with strings alone will be 50% of it.
Well, the Linux kernel source code is around 71MB compressed, I'd guess maybe 200MB uncompressed. That's quite a difference in source code size, and I think that the same is true with most OSS projects. The WINE project for instance is also < 100MB for the full (huge and extensive, including translation) source.
I think that commercial codebases just end up with a lot of cruft and nobody ever feels like cleaning them up (plus, there is incentive for keeping things a bit clunky as it buys slack-off time and/or extra hourly pay). As above, I also think they use other commercial/crappy components like third-party widgets that had the same treatment, so it all snowballs into a huge/unwieldy thing.