I hate XML as much as the next guy but in this case I would blame a poorly designed (or maybe very misused) API.
There's no reason why any file parsing library would end up fetching remote data without being explicitly asked to do so. Actually, it shouldn't even be the library's concern to fetch those files, an XML library has no business with networking. It's a security concern and a maintenance hell.
This is "XML speak" for an "include" statement to something like cpp, with the exception that this "include" could end up performing remote network fetches to acquire that which is being included.
So, technically, to be a proper, standards compliant XML parser, the parser has to at least submit requests to "fetch" these entities to the higher level code using the library, and let that code decide what to do about the "includes".
As to why Microsoft's implementation is the way it is, absent a Raymond Chen blog post explaining the why, we can only guess.
Ever tried to look in the documents saved by Libre Office or MS Office? They all use XML now. The ODT document with only two words in it has at the start of the content.xml inside of ODT this beauty:
The problem is, whenever you have some link anywhere and it is assumed that it should be refreshed sometimes, how can you know that you shouldn't load the more current version? If you write something like a DLL or library why not leave it to the expert: let the IE try to fetch it, and if it already fetched, it will return it from its own cache! Brilliant, problem solved! Except when that happens from 144 instances all the time and IE needs some windows creations at the start which is what Bruce seems to manage to trigger.
"Given that namespaces have definitive material, and that such definitive material is typically available on the Web, and that namespace names may be "http:"-class URIs, it is a grievous waste of potential if it is not possible to use the namespace name in retrieving the definitive material."
And in order to do all the processing and transformations popular at the time somewhere there should be the copies of the documents specified with the URI's. Bruce detected some loads from some documents stored in the DLLs, locally.
The amount of times I've fixed bugs as a direct consequence to this is simply astounding.
Two favourites:
1) App never started because it couldn't access the internet to fetch a DTD/XSD.
2) Sun/Oracle removed XSDs and the app refused to start.
> There's no reason why any file parsing library would end up fetching remote data
It's not the API, it is the part of many of the specs, as it was thought to be a good idea once, specifically many standardizations involved additional definitions located at the http servers, example:
Which was "poised to play a central role in the future of XML processing, especially in Web services where it serves as one of the fundamental pillars that higher levels of abstraction are built upon."
So you had to implement it to be "conforming," and then to avoid overheads as "optimizations." Ironically, "/optimize" feature isn't optimized.
The reason this it's not discovered earlier is that the Visual Studios which contain that option were priced more thousands of dollars (I don't know the what the currently cheapest version containing "/optimize" is -- anybody knows?).
There's no reason why any file parsing library would end up fetching remote data without being explicitly asked to do so. Actually, it shouldn't even be the library's concern to fetch those files, an XML library has no business with networking. It's a security concern and a maintenance hell.