The unfortunate excerpts from this post that stood out for me:
> Without access to a lot of source code I can’t tell exactly what is going on, but...
> Until Microsoft fixes their compiler to not load their web browser it seems impossible to avoid this problem when doing lots of parallel builds.
In my past life developing applications and services on Windows, this scenario was too common — something is broken, you can't quite tell why because you don't have the source, and you can't do anything but wait for it to be fixed or find some hacky workarounds (which is tougher to do without the source).
This has been the real win on switching to an open source stack for me: you can take your tools apart, look inside them, and fix any issues. I know Microsoft has come further in this in the past few years, but it still sucks to be stuck in that kind of situation.
> Fixing bugs in a program with 7 million lines of code is not easy.
It's not like you have to read all 7 million lines of code. The author of the article already has a stack trace; all he'd need to do is skim the functions along that trace. In this case it sounds highly likely that the fix is to comment out some code somewhere that is initializing a resource that isn't actually needed.
Very true. Being able to fix tools and libraries is such a change. I'll never go back to Microsoft (or any other closed-source, proprietary) development if I can help it.
May I ask what kind of F/OSS software bugs have you found and fixed (as a user) which you were unable to with proprietary software? I don't need a list, I want to know more about the description of the bug.
My story: MongoDB by default pre-allocates hundreds of MB for its databases, but I had a situation where I was instantiating lots and lots of small databases that only needed to hold a kB or two of data each. Command-line flags could only get the prealloc down to 32-ish MB. By twiddling some constants in the Mongo source code I got it down to the few kB I wanted.
Granted, MongoDB is not meant to be used for small databases (the name comes from hu-mongo-us), but the app I was using was written against Mongo and changing that would have been a whole lot harder than modifying some constants.
This is the specific example that happened not long after I abandoned Windows development altogether (I'd been using RoR since 2007, and Linux since 1995, but hadn't gone full time with both except through my own company):
I think the true enlightenment in this article (for me, at least) is that Windows's tracing framework is far more powerful than I had realized. Being able to capture a system-level trace is an incredibly valuable tool.
Until recently, I had thought that the only real tools out there to do this kind of analysis lived only on Solaris (DTrace). (As I recall, SystemTap gets part of the way, but does not reach far enough into userspace to really get a good view of what's going on -- is that still true?) It would be very interesting to augment Windows's performance tracing tools with some of the other things that DTrace has to offer -- pervasive low-overhead trace scripting seems like it would have made some of the other subsequent analyses that Bruce was working on much easier.
Ah, the wonders of XML. Somehow somebody decided to use XML to store some text data. Because after all you already have a parser for this in the approved tools. So why bother using something different?! And look it does all the cool stuff like namespaces and validation and amazingly it can even fetch remote DTDs by loading IE...
I hate XML as much as the next guy but in this case I would blame a poorly designed (or maybe very misused) API.
There's no reason why any file parsing library would end up fetching remote data without being explicitly asked to do so. Actually, it shouldn't even be the library's concern to fetch those files, an XML library has no business with networking. It's a security concern and a maintenance hell.
This is "XML speak" for an "include" statement to something like cpp, with the exception that this "include" could end up performing remote network fetches to acquire that which is being included.
So, technically, to be a proper, standards compliant XML parser, the parser has to at least submit requests to "fetch" these entities to the higher level code using the library, and let that code decide what to do about the "includes".
As to why Microsoft's implementation is the way it is, absent a Raymond Chen blog post explaining the why, we can only guess.
Ever tried to look in the documents saved by Libre Office or MS Office? They all use XML now. The ODT document with only two words in it has at the start of the content.xml inside of ODT this beauty:
The problem is, whenever you have some link anywhere and it is assumed that it should be refreshed sometimes, how can you know that you shouldn't load the more current version? If you write something like a DLL or library why not leave it to the expert: let the IE try to fetch it, and if it already fetched, it will return it from its own cache! Brilliant, problem solved! Except when that happens from 144 instances all the time and IE needs some windows creations at the start which is what Bruce seems to manage to trigger.
"Given that namespaces have definitive material, and that such definitive material is typically available on the Web, and that namespace names may be "http:"-class URIs, it is a grievous waste of potential if it is not possible to use the namespace name in retrieving the definitive material."
And in order to do all the processing and transformations popular at the time somewhere there should be the copies of the documents specified with the URI's. Bruce detected some loads from some documents stored in the DLLs, locally.
The amount of times I've fixed bugs as a direct consequence to this is simply astounding.
Two favourites:
1) App never started because it couldn't access the internet to fetch a DTD/XSD.
2) Sun/Oracle removed XSDs and the app refused to start.
> There's no reason why any file parsing library would end up fetching remote data
It's not the API, it is the part of many of the specs, as it was thought to be a good idea once, specifically many standardizations involved additional definitions located at the http servers, example:
Which was "poised to play a central role in the future of XML processing, especially in Web services where it serves as one of the fundamental pillars that higher levels of abstraction are built upon."
So you had to implement it to be "conforming," and then to avoid overheads as "optimizations." Ironically, "/optimize" feature isn't optimized.
The reason this it's not discovered earlier is that the Visual Studios which contain that option were priced more thousands of dollars (I don't know the what the currently cheapest version containing "/optimize" is -- anybody knows?).
You need to reduce the parallelism in your build. If it is possible in your build setup to be runing 144 concurrent builds you are trashing your caches. This isn't an optimization. You need to reduce the number of projects being built concurrently (or even the number of total projects.) While I advocate a bit of over allocation, you have it in the extreme and this is really not optimal at all.
Also you need to get using the precompiled headers if you can -- sometimes this requires structuring your projects right to avoid lots of little projects.
Even with large projects I usually have really low compilation times for C++.
Here is a blog post I wrote about the right way to do this:
A standard test for scheduler responsiveness in Linux is to fire up a kernel compile with unlimited jobs[0]. If responsiveness is lost, the scheduler isn't ready for prime time. You think Microsoft wants the answer to this issue to be "we can't even match Linux"?
[0] Well, it used to be. I assume they've moved on to more demanding tests now, since hardware has far outstripped kernel build complexity.
The issue isn't responsiveness during build for me (that isn't usually an issue, my build machine is generally responsive), it is compile performance from start to finish. If you over-allocate to the extreme you will trash your CPU caches and this will slow everything down. You want to minimize CPU context switches (thread switches) while also ensuring that your CPUs are running at nearly 100% -- it is a difficult balance, but unlimited jobs and focusing on the responsiveness of another task is not the solution to maximizing start-to-finish compile time for a large project.
I think the article author is quite aware that too much parallelism will slow down the build, but still wanted too investigate why it makes his computer so unresponsive.
Speed of build isn't the issue, responsiveness is. Throughput is commonly sacrificed in favor of interactive performance, otherwise we'd all be running some modern equivalent of punchcard batch jobs.
Sounds a bit like apples and oranges. If in your test GCC happened to stall because of a bug in some shared library in user mode, would you blame the scheduler?
Maybe this isn't exactly a great argument since I know Linus has been known to take the side of not ever breaking user mode wrt kernel changes (probably rightfully so) but it is possible to write buggy code, including hangs, that runs on Linux. That would be the analogous scenario.
They're not testing for GCC stalling, they're testing for user interaction stalling -- the same thing the OP is complaining about.
Strictly speaking, the direct cause of OP's problem appears to be outside the kernel, but it is system-level. Microsoft has created a stack in which low-priority tasks can trivially destroy the responsiveness of interactive tasks unrelated to the build process.
Is this the same thing as with Visual Studio compiling 12 independent projects using 12 threads each? Does the kernel compile with unlimited jobs keep the number of concurrently running jobs below the processor limit, or are additional jobs in any way blocked on cheap primitives, such as waiting for previous jobs with a mutex?
The issue here alone is that he was trying to compile 12 projects simultaneously each with every logical processor he has. There might be some slack in the system, but is each project 92% wait time?
> Does the kernel compile with unlimited jobs keep the number of concurrently running jobs below the processor limit, or are additional jobs in any way blocked on cheap primitives, such as waiting for previous jobs with a mutex?
How do you think threads were implemented by the OS back when we all had one processor with one core?
I don't understand your question. I'm asking if the Linux kernel compile with unlimited jobs oversubscribes compute resources in the same way that compiling 12 independent projects does.
He's asking whether the number of jobs is artificially limited to the number of processors, not what happens when the number of jobs is greater than the number of processors.
I think you're missing the point. The problem described in the article is the expected behavior of the indirectly loaded component (mshtml.dll) which creates message loop (COM STA requirement), and because compiler runs as a low priority process, this message loop causes priority inversion which affects entire system. There are many wrong things here:
1. COM uses message loop and hidden windows to serialize calls. This is pretty stupid IMO, but I'm pretty sure Raymond Chandler would find millions of good reasons why it's done so.
2. Layering problem (XML parser uses mshtml.dll etc etc).
3. Why use config in XML file? Isn't XML so 1990s?
I think Bruce already knows plenty about making builds fast ;-)
The linked article demonstrates some ignorance as well by noting that command-line MSBuild builds are slower than from Visual Studio. I'd hazard a guess and say that they didn't find the /m option.
I wrote the linked article, so I take responsibility for any ignorance. It was some time ago (three years) so it maybe that we were not using the /m option at the time with MSBuild.
To my knowledge VS caches the MSBuild projects (.targets etc) while it's running. So it's possible that, even if VS and MSBuild have the same degree of parallelism, not having to parse the MSBuild XML gives VS an edge.
> You need to reduce the parallelism in your build.
I looked at your blog post and it recommends using /MP and setting num. parallel project builds to num. cores.
In other words, you recommend exactly what I was doing!
We do use precompiled header files. And we get great build times. It's only with /analyze builds that the system locks up.
Yes, 144 parallel compiles is excessive, but as your article says, 12-way parallel compile and 12-way parallel project builds are both necessary to expose all possible parallelism. Otherwise there will be times when many CPUs are idle. Until VS gets a global scheduler to avoid over subscribing the CPUs there is no way to get full parallelism without sometimes over subscribing.
I'm surprised that you find SSDs that helpful. With enough RAM they should offer only modest improvements to compilation, because source files and header files get cached.
(ie that MSXML is calling URLMon, with unpredictable results). There's a comment at the bottom of that thread that suggests how to fix the bug if it's in your own code, but not if it's in Visual Studio...
edited to add: related bugs crop up in many programs that use xml parsers, not just MS. Some combination of disabling validation, loading of external entities, or adding an entity resolver is usually needed (see eg these options for Xerces http://xerces.apache.org/xerces-j/features.html). The symptom there was usually errors in production that didn't occur on the developer's box because in production the network connection to grab the schema didn't work...or worse, a SOAP service that will fetch external entities in messages.
I wouldn't say that VS loads "Internet Explorer", that's a bit of an exaggeration. iexplore.exe is Internet Explorer, mshtml.dll holds the DOM implementation it's using [1,2]. Now that's still close to a web browser, but one may also use Trident to work e.g. with generic XML documents. I don't know if VS does that and whether this is a good idea or not though.
He does claim that it creates a window, even if that's a conceptual one rather than a physical one. Hence the lock contention in the desktop window manager.
This is one of the things about Microsoft APIs that gets me. They all depend on each other in weird ways that have less to do with practicality than they do with getting people on board to use the various Microsoft APIs. For example, DirectX depends on COM. It doesn't have to, but they're both Microsoft technologies so why not? You're going to need to learn how to use COM anyway; it's the future. Systemd is designed this way too; that's why I hate it, despite the fact that it won. It seems architected with a "getting people on board with using certain tools" mindset rather than orthogonally providing functionality.
But this... this takes the dependency tangle to whole new levels of comedy.
I am not sure whether it is still the case, but the Scala compiler used to have a dependency on Swing. The reason was that one of the potent -Y flags gave you access to an AST browser after the phase of your choice. Very convenient tool for compiler plugin developers, but a questionable dependency for most.
(Edit: seems to have been taken away at least as of 2.10)
If true, this implies that recent versions of IE compiled with this compiler needed older versions of IE to compile themselves. IE is probably the first browser to have the honor of bootstrapping itself.
Sounds like it might be worth it to write a drop-in replacement DLL that doesn't parse XML, or at least in a way that doesn't require mshtml. The XML files that are being parsed appear to be static resources within the analyze DLL itself. A third party might even be able to patch the DLL to have pre-parsed resources instead of XML.
C# compilation is seriously fast compared to C++ and the likes (due to its proper staticly typed nature) and this allows the compiler to assert certain things quickly and not waste any more time on this.
Given how .NET developers loves separating things into tons of DLLs and libraries (and how .NET makes this easy), I would be surprised if this part of the compilation wasn't as optimized as the rest of the process.
C# compilation is fast in part because of the removal of headers and the replacement of them by well defined assemblies with interfaces. C# compilation is fast for the same reason Java compilation is fast. :)
C++ compilation is sort of screwy in that to compile a new C++ file the compiler actually has re-compile all the definitions of the libraries you are using. It is insanely inefficient compared to just using explicitly defined and immutable API interfaces on Assemblies/Modules.
C# sure compiles fast, but if you have several projects in one solution the majority of the time is spent shoveling dll's around like there's no tomorrow. SSD helps only a bit. :-/
My friend who I'd call the closest thing to a template programmer summarized the situation as such:
"The core tenant of C++ has been to move work from runtime to to compile time."
Both RAII (avoiding use of new & delete) and generics are strengths of C++ but also require heavy compile time work.
RAII needs decent analysis and optimization or your program will thrash copying struts by value. Generics and templates force your compiler to run a second compilation in your frontend.
This on top of C's copy/paste based include system and macros gives you what should be one of the slowest to compile procedural languages designable.
RAII doesn't necessarily mean avoiding new and delete. It just means cleaning up in destructors when you use new in the constructor or initialiser list or elsewhere. In particular, RAII means grabbing what you need in the constructor rather than haphazardly creating items in member functions.
Long time Microsoft and Windows app developer here. I just thought somebody should point out the obvious: that this is not even remotely unusual for a Microsoft app on a Microsoft operating system. You're just describing the normal architecture of the OS and the way apps work.
> Without access to a lot of source code I can’t tell exactly what is going on, but...
> Until Microsoft fixes their compiler to not load their web browser it seems impossible to avoid this problem when doing lots of parallel builds.
In my past life developing applications and services on Windows, this scenario was too common — something is broken, you can't quite tell why because you don't have the source, and you can't do anything but wait for it to be fixed or find some hacky workarounds (which is tougher to do without the source).
This has been the real win on switching to an open source stack for me: you can take your tools apart, look inside them, and fix any issues. I know Microsoft has come further in this in the past few years, but it still sucks to be stuck in that kind of situation.