Hacker News new | past | comments | ask | show | jobs | submit login

Why would anyone choose to use XML over JSON, other than for RSS?



I can parse/print XML (using either in memory parser or streaming parser), use XML Schema to validate XML, XPath expressions to select necessary parts, automatic object mapping, and that's all with standard library without a single external dependency in Java. I don't know why would I use JSON over XML, unless I have very good reasons to do so.

For me the only thing that JSON got better is that JSON is directly mapped to commonly used data structures: arrays and maps.


> For me the only thing that JSON got better is that JSON is directly mapped to commonly used data structures: arrays and maps.

This is nice, but it's also kind of a pain, as it makes you have to stop and think about which structured data elements it's capable of supporting and which you have to send your own metadata through the wire and then reconstruct on your own. For example: Dates. Which is a shame. If there is one data element I want the most help with serializing/deserializing, it's freaking Dates. All the other ones are super easy in comparison. There's just way to much subjective, dirty, human culture tied up in Dates.

The only thing that I think is objectively better in all cases about JSON over XML is the less verbose end-structure syntax. I think XML only has Attributes because Tags have this silly need to state their name both as they enter and exit the room.

    <Tag1 attr1="hello" attr2="world">
      <Tag2>how</Tag2>
      <Tag2>are</Tag2>
      <Tag2>you?</Tag2>
    </Tag1>
For simple, unnestable data elements, having the more efficient Attribute starts to look attractive.

If XML were more the form:

    <Tag1 attr1="hello" attr2="world">
      <Tag2>how</>
      <Tag2>are</>
      <Tag2>you?</>
    </>
It's actually only one more additional, required character to add than if the attribute value were specified as an element instead.

    <Tag1>
      <attr1>hello</>
      <attr2>world</>
      <Tag2>how</>
      <Tag2>are</>
      <Tag2>you?</>
    </>
Heck, why stop there? Do we really need to have quite all of those angle brackets now? How about we just get rid of all the ones we can assume:

    <Tag1
      <attr1 hello>
      <attr2 world>
      <Tag2 how>
      <Tag2 are>
      <Tag2 you?>
    >
And finally, who even likes angle brackets? I've never enjoyed the dual duty they play as delimiters in XML and operators in other languages. Let's use a common set delimiter, something like square brackets or maybe parentheses.

    (Tag1
      (attr1 hello)
      (attr2 world)
      (Tag2 how)
      (Tag2 are)
      (Tag2 you?))
Now where have I seen this before?

PS: JSON that is as nearly as equivalent as I can make it is not much less verbose than original XML, and requires some level of convention to make up for the differences:

    {Tag1: { attr1: "hello", attr2: "world", children: [
      {Tag2: "how"},
      {Tag2: "are"},
      {Tag2: "you?}]}
Though I'm sure in common practice it'd have a lot of the original metadata of the XML version thrown away:

    {attr1: "hello", attr2: "world", children: [
      "how",
      "are",
      "you?"]}


The XML tag style is much, much easier to work with when you're dealing with markup. And XML's purpose is to be an Extensible Markup Language. It's way more appropriate than JSON or S-expressions for that.

(Do you prefer to write HTML documents as S-expressions?)


> The XML tag style is much, much easier to work with when you're dealing with markup.

Having explicit end tags makes it harder to make well-formed documents that don't clash their closing tags.

Consider a very typical sort of HTML error:

    <table>
      <tr>
        <td>
        </tr>
      </td>
    </table>
Interleaving tags is never correct, yet XML allows us to do it (and I've seen it happen a lot).

The comparable S-Expr shows how it is just plain impossible to interleave tags:

    (table
      (tr
        (td)))
You might ask yourself, "which close paren closes which list?" if the document were particularly gnarly. But if we're talking about particularly gnarly documents, then XML can be just as ambiguous. You'd be using a text editor that highlighted matching parens for you, at that point, just as much as you'd be using one that highlights matching close- and end-tags.


This is not correct XML, parser will throw error. This is not correct HTML either. The only reason why this code is likely to produce good enough output in browser is that browser tries really hard to produce something readable even from complete garbage.


Yes, I know it's not valid XML If you had read my post and not just skimmed the examples, you would have seen that was the point. XML's verbose end-tag feature makes it possible to make malformed documents in such a way that is just plain impossible with S-Expressions.


Such interleavings can actually be valid HTML5, in that the specification defines an algorithm for parsing that handles such "tag soup" in a reasonable way.


That's not the same thing as making interleaving valid.


What's the difference?


I've never had trouble with that.


> Do you prefer to write HTML documents as S-expressions?

Actually, yes. I use CL-WHO[1] a lot, in which one can write:

    (:html
     (:head
      (:title "Foo bar")
      (:link :rel "stylesheet" :href="baz.css"))
     (:body
      (:p "Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Vestibulum ullamcorper efficitur purus, at suscipit nunc luctus vitae.")
      (:ol
       (:li "Cras vel est accumsan, malesuada leo eu, iaculis nulla.")
       (:li "Proin nec mi feugiat, posuere enim in, vehicula erat.")
       (:li "Morbi vitae purus nec neque posuere pharetra ultricies in nibh.")
       (:li "Nam maximus lectus faucibus, ullamcorper lectus aliquam, aliquam lectus."))))
Which I contend is prettier than the equivalent HTML.

[1] http://weitz.de/cl-who/


I like S-expressions too, especially when generating markup programmatically. For hand-writing HTML/XML documents, which I do quite a lot, I really enjoy the tag style because of the verbose end tags and the ease of moving blocks. It's at least nice enough to make me annoyed when people claim the tag syntax is some horrible stupid disaster compared to S-expressions or (worse) JSON.


> For hand-writing HTML/XML documents, which I do quite a lot, I really enjoy the tag style because of the verbose end tags and the ease of moving blocks.

I can't say anything about liking verbose tags, which seems to me a matter of taste, but moving around S-expression blocks is easy: C-SPC to set the point, M-C-f to move forward one S-expression, C-w to cut the current region, navigate to where one wants it, C-y to yank the cut region.

Granted, this is using emacs, which really had better have good S-expression-editing capabilities after 40 years!


I also quite like Dylan's way of ending blocks, letting you type for example "end method do-stuff" so you can see clearly what's being ended, which is useful in a document with long sections.

And I like that XML block moving is even manageable with ed, which I actually use sometimes. Well, and vi.


Yes. With good tooling (such as Emacs), markup is much more pleasant to write in S-expressions than XML.


I'll add some more reasons to the flamebait:

- JSON doesn't have namespaces, making integration of different data-sources quite hard.

- XML allows me to do versioning within documents.

- An extremely large corpus of well-tested libraries are available.

- As opposed to JSON, XML and accompanying standards (XSLT, XML Schema, XPath, XQuery) are extremely well documented.

- XML validation, parsing and processing can happen at the same time, allowing streaming solutions. Using the XML schema, a parser can be created which is optimized for a specific stream of data.

(edit: formatting)


All right, I'll bite:

>JSON doesn't have namespaces, making integration of different data-sources quite hard.

Yeah, and how often do you merge two data formats like that into one data format in a way that doesn't require massive transformations anyway?

>XML allows me to do versioning within documents

Well, that's fantastic. Because XML is designed for DOCUMENTS. But JSON is designed as a wire protocol, and a data exchange format, which is very different. You shouldn't use JSON for documents, and I very much doubt that's what OP was talking about.

>An extremely large corpus of well-tested libraries are available.

For your language. But XML is fairly complex, and there are a lot of environments with no support, and JSON parsing is so simple and easy that the complete grammar, as well as the semantics, are on the front page of the website, and it's unlikely your language doesn't have support for it already.

>As opposed to JSON, XML and accompanying standards (XSLT, XML Schema, XPath, XQuery) are extremely well documented.

The grammar and semantics are on the front page of the site. And JSON is simple enough there's little more to it than that.

>XML validation, parsing and processing can happen at the same time, allowing streaming solutions. Using the XML schema, a parser can be created which is optimized for a specific stream of data.

Really? That's actually kinda cool. :-)


Alright, in the flamebait fashion, I'll bite back :)

> Yeah, and how often do you merge two data formats like that into one data format in a way that doesn't require massive transformations anyway?

Actually, quite a lot in the past! Back in 2009 I did some XProc pipelining of messages. These pipelines were a bit like reactive streams, which were (mostly) agnostic of the contents. This allowed me to combine, dissect and route streams of data in an intuitive way. Maybe you can compare it with mapping over a collection: you don't care what's inside, but you want to preserve the contents. XProc was kind of a functional programming + reactive approach to data processing. Pretty cool and ahead of its time, if you ask me.

> But JSON is designed as a wire protocol

Reference, please? Even if I subscribe to one definition of 'wire protocol' on the internet (there are many), I don't think it creates a meaningful distinction between XML and JSON.

> JSON parsing is so simple

Actually, it is, and it isn't. Yes, there are very few primitives (strings, booleans, numbers, arrays, objects), but this also causes important limitations. For example, it is rather cumbersome and unspecified to transfer binary data in a JSON document (base64 encoding). Another thing: how easy is it to parse a streaming JSON document in Javascript?

> The grammar and semantics are on the front page of the site.

Admittedly, that's a lot easier than, say: https://www.w3.org/TR/xml11/ These guys really took it too far...

> That's actually kinda cool.

That's what I thought too when I first heard about it :)


Okay, back to me:

>Actually, quite a lot in the past! Back in 2009 I did some XProc pipelining of messages. These pipelines were a bit like reactive streams, which were (mostly) agnostic of the contents. This allowed me to combine, dissect and route streams of data in an intuitive way.

Huh. So like this:

  |xmlstream|->|transformer|->|xmlstream|
Pretty slick. So the namespacing allowed you to add new tags, without worrying about tripping over the old ones? Cool, but the types of transforms you can do without knowing the internals of the XML you're transforming are fairly limited, and because JSON's objects don't mandate an app-wide wide meaning for a key - the closest thing JSON has to XML tags, you can just attach the new data to a new dict, and the problem solves itself. If you're merging objects, and each gives a different value for a key, than you can set up either an array or an object to hold both, or just send along both objects, wrapped in an array/object like before: in essence, by JSON'S semantics, each object is its own namespace.

>Reference, please? Even if I subscribe to one definition of 'wire protocol' on the internet (there are many), I don't think it creates a meaningful distinction between XML and JSON.

References, I can give. json.org, first paragraph:

  JSON (JavaScript Object Notation) is a lightweight data-interchange format. 
I apologize for being unclear: Data Interchange format is what I meant.

XML was not intended to be a generic data-interchange format: It, like HTML, SGML, and GML before it, were designed for DOCUMENT markup: Human-readable, structured, semantic DOCUMENTS. It has since been pressed into service as a data-interchange format, and it's a testament to how well it was designed that it works as well as it does for that, but its verbosity and general format and layout make it ill-suited to the purpose. JSON was designed for data interchange: I said wire protocol, as Data Interchange is often about sending data between applications on a network, which is what a wire protocol is for.

Hopefully some of that answers your question.

>Actually, it is, and it isn't. Yes, there are very few primitives (strings, booleans, numbers, arrays, objects), but this also causes important limitations. For example, it is rather cumbersome and unspecified to transfer binary data in a JSON document (base64 encoding). Another thing: how easy is it to parse a streaming JSON document in Javascript?

Is it specced to transfer binary data in XML? First I'd heard of it. Base64, uuencode, hex, or raw numbers, there are plenty of ways to encode binary data in JSON, and if you're using any system that has reserved characters (like CDATA in XML, if that's what you're thinking of), than you have to do this sort of encoding somehow. Besides, you could always send the json as a header, and have the app get the binary data from a different endpoint. Although you may want Base64 to avoid the roundtrips...

As for parsing streaming JSON, I don't know of there are any libraries for it, but the implementation should be very simple: Like XML, JSON is a tree, so parser state can be represented as a stack: You see a {, you're now in an object. A [, you're in an array. A , indicates adding a new value to the current array, or a new k/v pair to the current dict. What each character means is deterministic, given what came before, so you can construct JSON as the data comes in, and provide access to each value as it becomes ready. Although, given most JS implementations' multithreading limitations, all this really does is ensure that you don't have to have the entirety of the data in memory before you start parsing. Which is a good idea...


wrt the XProc pipelining: it's been some time, but I recall various possible transforming and matching steps, such as a conditional stream, transformers, reduction steps of multiple streams and others. This could then be combined with XPath, XQuery, XSLT and even SOAP requests. The problem of XProc was similar to the rest of the XML era: it required too much of the implementer to understand. Also, good programmer tooling, such as graph editors for the pipelines were missing. Perhaps this is slightly similar to functional programming and category theory nowadays. The ideas are sound, but it produces too much study and too little profit. Also, to properly work with category theory in programming, it would be very nice to have some graphical tools to view the transformations applied to your code.

Now for the flamebait-y parts. Yes, XML has loads of archaic SGML syntax bits, DTD built-in (hopeless for small parsers) and the attribute/sub-element divide has never been completely solved. But it can be argued that JSON had a similar fate: it literally descended as a subset of ECMAscript. This also explains the lack of separation between integers and floats.

But I agree, XML is definitely not the best data-interchange format, but neither is JSON. Some LISPy syntax would be my preference for data-interchange if it needs to be readable. But I'm trying to argue that this doesn't really matter. The XML era is mostly over, and I'd say we should try to learn from 'the good parts'.

I conflated XML with the XML Schema datatypes, and I shouldn't have, but it has been some time since I last seriously worked with XML. Also, we should consider the whole ecosystem, not just structure. XML Schema actually does spec binary data (http://www.datypic.com/sc/xsd/t-xsd_hexBinary.html).

With regard to round-trips: as a general rule it might not hold if you ask me. It implies an origin and even state! Maybe the sender cannot cache your binary data for a round-trip (memory, legal, latency, security constraints all play a role here). Personally, I like RESTful for simple systems, but for more involved architectures, message passing is much more scalable and easier to distribute.

> As for parsing streaming JSON, I don't know of there are any libraries for it, but the implementation should be very simple ...

Yes, a novice programmer should be able to write it in an hour. But the weird thing is, of all our libraries and all our frameworks (browser-side), none of them do streaming. Ok, I guess we should use websockets with JSON-encoded events for this, but still.

But hey, there are so many metrics with which one can evaluate a data-interchange format. (Recently we did a survey of binary data-interchange formats and found around 25 different criteria... and we were not really being thorough).


Okay. Cool.

I wasn't just trying to flame when I wrote: Talking to people who have different ideas and use different stacks is a good idea, and it teaches you things you didn't know before. And learning about stuff is why I use HN in the first place :-).

>The problem of XProc was similar to the rest of the XML era: it required too much of the implementer to understand. Also, good programmer tooling, such as graph editors for the pipelines were missing.

I will have to look up XProc now, because the things you've been saying sound really interesting, and it's clear I don't really get it.

>But I agree, XML is definitely not the best data-interchange format, but neither is JSON. Some LISPy syntax would be my preference for data-interchange if it needs to be readable.

I suppose. I love lisp considerably more than the next guy, but lisp structures technically only specify linked lists, which are O(n) for all data retrival. This is also how most implementations implement them. Also, JSON is similar, and trival to convert to that format

  ["like", {"key":"this"}]
  =>("like" (("key" . "this")))
Although you would idiomatically use symbols in many places where JSON uses strings.

>XML Schema actually does spec binary data

Once again my inexperience with xml shows. Thanks for letting me know.

>With regard to round-trips: as a general rule it might not hold if you ask me. It implies an origin and even state![...] Personally, I like RESTful for simple systems, but for more involved architectures, message passing is much more scalable and easier to distribute.

Firstly, I'm pretty sure REST implies a message-passing architecture. Correct me if I'm wrong.

Secondly, the round-trip idea sucks for a number of reasons, but I don't think it has to imply either. Let's say you have an endpoint at example.com/<userid>/lastmessage, which might give you the last message the user sent. If the user sent "just ate at Joes, #delicious," you might receive:

  {"message-type":"text", "message":"just ate at Joes, #delicious"}
But if the user sends an image, it's uploaded to the server, and you have to get it down. So you would instead get:

  {"message-type":"image", "message":"X57pqr32"}
And you would ask for example.com/<userid>/static/X57pqr32.

I don't know, but I think that would work.

>But the weird thing is, of all our libraries and all our frameworks (browser-side), none of them do streaming.

And I actually know why this is: Before the advent of Websockets, the only API was either XHR, or awful ideas (JSONP should chill the blood of any security expert). None of them supported reading incrementally, AFAIK, so there was no point. Now that Websockets are a thing, it shouldn't be long coming. Now all we need to do is build something to put JSONP in the ground...

>But hey, there are so many metrics with which one can evaluate a data-interchange format. (Recently we did a survey of binary data-interchange formats and found around 25 different criteria... and we were not really being thorough).

Indeed. By the way, did you look at Cap'n Proto and MessagePack? Neither are really on the fringe, but they look interesting, and they seem to have some decent support.


By the way, a good example of the multiple-trip RESTful API I described is XKCD's JSON API (http://xkcd.com/info.0.json)


Yeps, that's basically HATEOS :)


> Firstly, I'm pretty sure REST implies a message-passing architecture. Correct me if I'm wrong.

It absolutely is, but afaik (correct me if I'm wrong here) it implies an origin. It relies completely on addressable and available resources. It relies on exactly-once semantics (POST) and round-trips. Message passing for me is more like the actor model: ephemeral information, at most-once delivery, references to computers (actors), not data, and most important: the message is centric, not the end-point.

Perhaps I'm understanding all of this completely wrong, I'm honest here, but the actor model for me means 'message passing orientation' and RESTful to me means 'resource orientation'.

> And you would ask for example.com/<userid>/static/X57pqr32.

I implemented more or less the same scheme in a message centric application for crypto. Larger objects such as photos and videos were encrypted and placed in a central storage (later design phase included DHT implementation). The receiver could decrypt the message at a later time, whenever the photo was visible in the app/webpage. The central server, however, was non the wiser, as all data was encrypted and without semantic information. Here it is interesting to note that, even though we use references (URIs), the resource is not identifiable, except for its SHA hash signature. There was no sense in saying: https://kanta-messenger.com/photos/1234abcd since there is no knowledge of 'photo' or 'video'. However, there is still representable state transfer (REST) going on, without any of the semantics.

> . Now all we need to do is build something to put JSONP in the ground...

Agreed

> By the way, did you look at Cap'n Proto and MessagePack?

We had two phases (since it takes quite a lot of time to research each data-exchange protocol). In the first phase, we evaluated on a couple of core criteria: Language support (Scala, Java, Python), no long-standing github issues, more than one core commiter. We reduced that to three protocols: protobuf, Flatbuffers and Apache Avro. To most of our surprise, the last one won. Why? Various reasons, one of them being the possibility to do reflection and search within encoded messages for which the receiver does not have a schema. For example, you might want to create a router which only routes messages which contain a certain header. Another is archiving: since the schema is always included, it is possible to decode messages years after they have been stored somewhere. A third one is forward- and backward-compatibility. All of them were close wins (4 vs. 5 stars), but it brought us to Apache Avro. Looking back on that decision, it was a good one. Many within the company are happy with the choice.


>All of them were close wins (4 vs. 5 stars), but it brought us to Apache Avro. Looking back on that decision, it was a good one. Many within the company are happy with the choice.

Neat, I may check it out.

>It absolutely is, but afaik (correct me if I'm wrong here) it implies an origin. It relies completely on addressable and available resources. It relies on exactly-once semantics (POST) and round-trips. Message passing for me is more like the actor model: ephemeral information, at most-once delivery, references to computers (actors), not data, and most important: the message is centric, not the end-point.

I mean, that IS a valid way to think about it. I think about it like this:

when you're using a REST API, you are sending a message to an application. That application is defined in part by your endpoint: The server, and the path to the app. The rest of your message (params, method, remaining path) is your message. Some applications map the messages you send them onto a sort of virtual filesystem, which may or may not correspond to a real one. This appears in webservers, and many APIs. For these, messages you send primarily consist of paths. Others treat their messages more as procedure calls, and use more params. Both are messages, just as sure as

  cat /proc/sys/net/ipv4/ip_forward
and

  sysctl net.inet.ip.forwarding
even though one uses a filesystem model, and the other uses a command.

But your model of REST, while less linked to message passing, has much less cognitive load.

There's something wrong with me.

Actually, it's funny we're discussing message passing, because I've been working on an app that uses message passing between pre-emptive co-routines, and kinda-sorta unidirectional data flow heavily. Of course, at 2 coroutines per connection, it won't scale. Thankfully, it won't have to.

I hope.


> - JSON doesn't have namespaces, making integration of different data-sources quite hard.

The only reason anyone would ever say that is ... that they have used XML (or SGML) and are subscribed to that mindset.

Every toplevel JSON document is a valid value in any other JSON document. That's how easy it is to integrate. The only reason that XML/SGML needs namespaces in the first place is that the schema dictates what an element, e.g. <block>, can have as children or attributes, and how many -- and as a result, <block><statement/></block> from a programming language and <block><cityblock/></block> from a city design schema cannot be mixed (neither would be valid in the other's schema). So you have to use <code:block><code:statement/></code:block> and <city:block><city:cityblock/></city:block> to differentiate them.

> - XML allows me to do versioning within documents.

What kind of versioning are you referring to? Schema versioning? Data versioning?


I've never really subscribed to the mindset of XML for it has many disadvantages (the verbosity, the complexity of DTD, the tendency for documents which are too large). However, I do subscribe to namespaces, since it allows global referencing of names. I also do subscribe to formal grammars, mature standards, and good documentation. FYI, I've designed streaming JSON based secure messaging systems, did binary-only schema's for speed, and for simple tasks I just implement JSON+REST, since everyone nowadays come to expect it. It's just that I think XML got an undeserved bad reputation and many of the 'good parts' have been forgotten.

With regard to namespaces, when designing standards, it is very useful to separate one 'person' definition from another, since they might not have the same semantics. This allows us to connect, say, 'com.facebook:Person' with 'com.google:Person' with a global equivalence relation. It allows us to specify bridges between standards.

I don't really subscribe to the 'it is a valid value in another JSON document'. It is only valid when it can be interpreted by a receiving program (otherwise it is data, not information). The namespaces are not there for validation (alone), they are there for interpretation.

With versioning, I meant schema versioning. Admittedly, not a great solution, but at least it allows a receiving party to know which parts can safely be interpreted.


> I do subscribe to namespaces, since it allows global referencing of names

I have been forced to use XML one way or another for a variety of uses (mostly integration, not document storage - but still), and have not ONCE had a use for namespaces or multiple DTDs in a single document. I suspect no one has statistics, but I wouldn't be surprised if this is true for 99.9% of {users,documents,systems} - which, if true, means that 1/1000 burdens the rest needlessly. But of course, this is mere speculation.

> I also do subscribe to formal grammars, mature standards, and good documentation.

XML is enticing by appearing to have those, but it actually doesn't, as Naggum articulated in[0]. XML schemas can describe a superficial structure, but not anything non-trivial and definitely not any semantics. Naggum is entertaining though he holds nothing back, see e.g. [1].

> It's just that I think XML got an undeserved bad reputation and many of the 'good parts' have been forgotten.

The problem with XML is that, like lawyers, 95% of the population gives an undeserved bad reputation to all the rest. XML did have some good ideas, but they are almost nowhere to be find in practice.

> This allows us to connect, say, 'com.facebook:Person' with 'com.google:Person' with a global equivalence relation. It allows us to specify bridges between standards.

No it doesn't, unless they are semantically equivalent - which they never are. They might be superficially similar, with some translation possible using (e.g.) XSLT. But if, for example, com.google:Person has <DisplayName> and no first/middle/last, and com.facebook:Person has <FirstName>, <MiddleName> and <LastName> (but no display name), then XSLT can only translate one way, and nothing can translate the other way without error. It's nice in theory, but - projecting from my experience which is long and across many industries, but obviously still anecdotal - in practice, the semantic differences always require logic beyond XSLT, and thus the namespaces are only of aesthetic value if any.

> It is only valid when it can be interpreted by a receiving program (otherwise it is data, not information)

True. How is that different than XML or anything else? The same statement applies to XML, namespaces or not. If the program doesn't know what it is interpreting, the namespaces do not matter. If it does know, they don't matter either. Sure, it's a way to mark the source through _all_ elements, but since the program must be aware anyway, you can just as well enclose your Person object with {Facebook: {first:'John', last:'Smith'}} or {Google: {display:'John Smith'}}. Yes, XML has a standard way of doing that - but in practice my experience that it costs about 1000 times what it provides.

> With versioning, I meant schema versioning. Admittedly, not a great solution, but at least it allows a receiving party to know which parts can safely be interpreted.

And what if semantically, the parts you don't know about make interpretation moot? Practically, if it's a version you don't know, you shouldn't try to interpret it. And that's achieved by a simple 'version' field in JSON. The standard way of doing this buys practically nothing - 99% of XML files out there do not declare or properly follow a DTD.

[0] http://www.xach.com/naggum/articles/3224504693262432@naggum.... [1] http://www.schnada.de/grapt/eriknaggum-xmlrant.html


> and have not ONCE had a use for namespaces or multiple DTDs in a single document.

I'm actually rather surprised about that. Take an XSLT document and you're bound to use multiple namespaces. Have you never used an editor which provides tab-completion, quick validation and documentation of tags on the fly? The systems I worked with heavily relied on namespaces for validation, exploration, versioning and prevention of naming clashes. These, however, were heavily distributed systems within government organisations.

Also, please note I'm talking about namespaces within and outside XML. I'm saying that namespaces are a cheap and easy to implement design rule.

Ok, now your references.

[0] is actually an argument for namespaces (and XML schema or suchlike). If I understand correctly, he proposes a system which allows you to specify part of an XML document post-hoc (using a namespace which references a schema which is specific to the module-writer).

The second one, I must admit that for me it had a low signal-to-noise ratio. The writing seems to refer only to XML and DTD, but nothing about larger ecosystem, which my arguments were about. Anyways, remove all the banter and you're left with a couple of arguments:

1. the syntax is verbose (yes it is, nobody disagrees, not even the designers).

2. there is no macro support (perhaps useful, one could embed an XSLT stylesheet if necessary). I find this a minor point. It would also severely complicate the parsers and make them stateful and memory-bound.

3. binary representation (in line with 1). How many good, portable binary structured editors do you know? How much does the size of an XML document improve with simple compression to binary? (hint, quite a lot). Also, when going to binary, there are many other design choices, such as: should it be possible to memory-map the document so the CPU is not involved? Should pointers be employed so we can skip sections of the document? Should we use names, ids, UUIDs? Do we optimize for processing use, network use, memory use? [1] seems to only argue about network utilization (which, for most applications, is abundant).

The rest of the document (I have to admit, I skimmed some parts), appears to be a rant on everything and everyone stupid. The king who shouts: "I am the king!", is no true king.

<skipping some parts>

> If it does know [the namespaces], they don't matter either.

To structure something, we first need to construct, e.g. bring together and later we need to deconstruct. In both cases, it helps to have namespaces because (de)constructing might involve many different distributed parties, with different versions of software. It is my opinion that in truly distributed systems, naming and typing are of utmost importance.

It's getting quite late here. Thanks for making me think about this subject again. Sadly I cannot answer all of your points within a reasonable time.


XPath expressions are actually pretty cool (and I say this as not a great fan of XML in general). The ability to search and select elements is something we use all the time. For many examples, search for "xpath_" in https://github.com/libguestfs/libguestfs/blob/master/v2v/inp...


I'd drink to that, XPath is incredibly useful and easily mastered.

I'm also fond of XSD and XSLT myself. They can be obtuse at times, but have been indispensable in the use cases that I've needed them for.


If you're actually marking up text, JSON doesn't really work. XML works alright for its intended use as a language. (That doesn't mean I hate it any less.)


I think if you're marking up text, you really want to be using markdown one of the other wiki like formats. For writing you're much more likely to run into non-technical people who will be severely impacted by syntax errors.


If you are working with statically typed languages, the validation of XML is far superior to anything you can do over json, unless you want to write your own formats for defining data structures in JSON.

JSON, remember, was written for a language without even firm object structures. It is great in that environment but all exchange formats require external knowledge to validate, and XML unlike the others provides a way to do that.


For one, JSON didn't exist 15 years ago.

For another, JSON didn't have validation or schemas 5 years ago.


Even there, the schemas and validation are very lightweight compared to what XML can do.

As I usually say, JSON is for relatively free-form, dynamically typed languages, but if one side is uses a statically typed language, XML is probably the better choice.


Hmm... I always tend to use XML for B2B communication or Mine-to-Theirs type RPCs. I use JSON primarily for Client-to-Server communication internally with own applications or for public APIs.


JSON was defined in April 2001, basically a subset of JavaScript specs from 1999.

So, JSON did exist 15 years ago, but not 16 years ago, although you have to go back 20 years if you want your statement to not just be about the name.

And yet ... how many of the decisions to use XML go back those 15 years? Hardly any.


JSON may have been created 15 years ago but it wasn't well known or commonly used for a number of years. Yahoo! only started using it in 2005 and Google in 2006. XML had been around and in use for years prior to that and even today has a much richer toolchain.


Because its prefered or reqired by a well paying customer. With any sane web framework you get both JSON and XML out of the box. If the well paying customer doesnt send an XML Accept header you make it the default respose type and tell everyone else to send a JSON Accept header. If theres a conflict between multiple well paying customers, you make new end points.


Working in a statically typed language, and using schemas to ensure that the generated messages are correct, because we can generated classes that map onto the required data structures.

So far as I know JSON doesn't allow for that.


JSON is nice for structured data but when it comes to more mixed document style data XML is preferable IMO


Most of the advice applies equally to generating JSON data as well.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: