If used sensibly XML isn't too bad. But there's a whole lot of cruft in the standard that seems to do nothing except make it harder to use. Part of this is a problem with popular libraries rather than inherent to the format, but we judge a thing by its ecosystem rather than in isolation. So: namespaces are a pain, making it much harder than it should be to just make my xpath work. DTDs are annoying, especially when a production system breaks because a remote server that was hosting a DTD goes down so now your parser refuses to load a file. User-defined entities seem pointless, and though most parsers can handle the billion laughs these days it wasn't always so. The handling of text nodes is confusing; whitespace is irrelevant except when it isn't. Specifying the encoding inside the document itself seems wrong, and supporting multiple encodings at all causes trouble (e.g. sometimes it's simply impossible to include one document in another inline).
Is XML schema really so much better than e.g. JSON schema?
To me it feels like there's an impedance mismatch between the kind of structures XML lends itself to and the kind of structures programs are good at dealing with. So for program-to-program communications with a certain level of validation I find Protocol Buffers is a much better fit. Conversely in cases where human readability is really important, XML isn't good enough compared to JSON.
> So: namespaces are a pain, making it much harder than it should be to just make my xpath work.
Namespaces exist to solve a real-world problem that happens in real-world use cases (SVG embedded in HTML, HTML embedded in RSS). While it would be nice to look at things that are complex and say "it would be less complex for these trivial cases without this feature", in reality there are then common use cases that become more complex or even impossible in the general case, which seems like a very short-sighted benefit. Namespace prefixes are really not that difficult to configure, and once configured XPath makes them very easy to use :/.
The biggest caveat with namespaces is that most people have never bothered figuring out how they work. The number of applications I've seen that have hardcoded namespace names instead of looking up the namespace uri for example, is horrifying.
Namespace prefixes are not that difficult to configure once you know about them. But if you're just starting with XML, probably because you need to extract some information from a document you've been sent, you don't want to learn the theory of XML, you want to get the data you need out and get on with adding business value. So you find a tutorial, you write an xpath, and it doesn't work. You try removing the foo: prefixes in your xpath, and it still doesn't work. This is not the experience that a technology should give new users. A default of matching ignoring namespaces would not make anything impossible.
Indeed. XML gets a lot of hate because it's so difficult to use. It would be fine if you could use it without having to care about the 100 features you don't care about and just use the ones you need, but pretty much every library I've seen makes parsing (or generating) a document a huge and complicated task, and most of it is completely irrelevant to the problem I'm trying to solve.
And because of this almost no-one bothers to actually handle it properly so you often can't actually use the advanced features even if you wanted to.
This varies greatly from framework to framework, and language to language. On the JVM at least, the dark machinery that handles the XML is rather rigorously correct. Parsing and generation are trivial, especially using JAXP. You have multiple ways of working with XML (objects, DOM, push, pull).
XML is "good enough" for a lot of cases. There are lots of tools to mess around with it too, which is really quite valuable when you're experimenting with various kinds of data or you're debugging. Being able to extract out stuff you're interested in XML format means you can perform a lot of complex manipulations quite easily.
Is XML schema really so much better than e.g. JSON schema?
To me it feels like there's an impedance mismatch between the kind of structures XML lends itself to and the kind of structures programs are good at dealing with. So for program-to-program communications with a certain level of validation I find Protocol Buffers is a much better fit. Conversely in cases where human readability is really important, XML isn't good enough compared to JSON.