JSON vs. XML

adamgordonbell · on April 6, 2023

This quote is funny:

    Douglas: The first time I saw JavaScript when it was first announced in 1995, I thought it was the stupidest thing I’d ever seen. And partly why I thought that was because they were lying about what it was.

A bigger more interesting thing though is how his company failed, in part, because they used hand-rolled JSON for messaging.

    Douglas: And some of our customers were confused and said, “Well, where’s the enormous tool stack that you need in order to manage all of that?” 

    “There isn’t one, because it’s not necessary”, and they just could not understand that. They assumed there wasn’t one because we hadn’t gotten around to writing it. They couldn’t accept that it wasn’t necessary.

    Adam: It’s like you had an electric car and they were like, “Well, where do we put the gas in?”

    Douglas: It was very much like that, very much like that. There were some people who said, “Oh, we just committed to XML, sorry, we can’t do anything that isn’t XML.”

I started my career during peak XML crazy and while I liked parts of it at the time, the number of things it was used for was quite insane. I had to maintain a system once where a major part of it was XSLT, when could have just been a simple imperative algo with some config settings.

Anyhow, hope you like the episode!

bambax · on April 6, 2023

> I had to maintain a system once where a major part of it was XSLT

Every time the topic comes up I feel the need to say that I loved XSLT. It was so nice. XML frankly was kind of simple, too. It had elements and attributes and that was it. And it had xpath, which offered, among other things, a parent axis, so you could walk the node tree upwards.

In JSON you can't get to the parent from the child. And walking down a tree is unintuitive, because nodes can be of different types, and if you want to maintain the order, or use successive instances of the same things (that would have the same name) you need to use arrays, and arrays of arrays of arrays look bad. Schemas are an afterthought.

JavaScript is cool -- it has mostly eaten the world anyway. But JSON is not so good IMHO.

stickfigure · on April 6, 2023

I would describe it something like: XML is great as a document format, but shitty as an RPC format. JSON is vice-versa. Web developers spend a lot of time with JSON as an RPC format, so they tend to put it on a pedestal. But try keeping your recipe collection structured in JSON text files and the pain will start immediately. YAML is even worse.

XSLT was (and still is) great for transforming documents. Want that recipe collection as HTML? Easy.

dwaite · on April 7, 2023

Yep:

- If you are describing hierarchal data, JSON is great

- If you are describing text with markup, especially extensible markup, for machine generation and consumption, XML is great.

- If you are describing a graph, neither have broadly accepted standards so you are kinda on your own.

Depending on your requirements, a recipe collection might be better in XML or in a flavor of markdown. A comprehensive data schema and software support for recipes could be challenging/limiting, compared to marked-up text.

stickfigure · on April 7, 2023

Markdown (like HTML) offers formatting structure, not semantic structure. Maybe you want to query for recipes that can be made in under an hour, or that contain orange as an ingredient (as opposed to merely a serving suggestion in an orange bowl). A proper XML (or even JSON or YAML) structure would enable this, Markdown does not.

You can pretty easily translate XML to Markdown using XSLT, though.

I don't think hierarchal structure is the differentiator; recipes and web pages are hierarchical and they'd still be hell in JSON. XML handles hierarchy just fine. I think the differentiator is whether your content is a document, that is, composed significantly of multiline text. Multiline text in a JSON file tends to be human-hostile, but we're all comfortable editing eg html.

creatonez · on April 7, 2023

> If you are describing a graph, neither have broadly accepted standards so you are kinda on your own

Relational databases can describe most graphs, but they rarely ever have a great text format.

goto11 · on April 7, 2023

And CSV for tabular data!

stickfigure · on April 7, 2023

If only it was more standardized :-(

chrismorgan · on April 6, 2023

I first touched XSLT in 2010. I appreciated what it could do, but it was painful to work with due to poor documentation and tooling. This has only gotten worse by comparison with alternatives.

You can still do XSLT in the browser. You can serve arbitrary XML and transform it. As an example, Atom feeds on my website (such as <https://chrismorgan.info/blog/tags/meta/feed.xml>) render just fine in all mainstream browsers, thanks to this processing instruction at the start of the file:

  <?xml-stylesheet type="text/xsl" href="/atom.xsl"?>

But working with it is not particularly fun, because XML support in browsers has been only minimally maintained for the last twenty or so years. Error handling is atrocious (e.g. largely not giving you any stack trace or equivalent, or emitting errors only to stdout), documentation is lousy, some features you’d have expected from what the specs say are simply unsupported (and not consistently across engines), and there are behavioural bugs all over the place, e.g. in Firefox loading any of my feeds that also fetch resources from other origins will occasionally just hang, and you’ll have to reload the page to get it to render; and if you reload the page, you’ll have to close and reopen the dev tools for them to continue working.

bambax · on April 6, 2023

I think out of the box, browsers can only do xslt 1; but Saxon offers a JS version of their engine that does xslt 3.0 and is free (as in beer): https://www.saxonica.com/saxon-js/index.xml

chrismorgan · on April 7, 2023

It’s actually worse than XSLT 1.0 due to inconsistencies and incompletenesses. For example, Firefox doesn’t respect <xsl:output method="html">, but uses an XML parser on the transformed result regardless; and doesn’t support disable-output-escaping. I wanted these for my Atom stylesheet (for <content type="html"> and the likes; instead I had to emit serialised HTML and decode it in JavaScript, though with difficulty I could have done feature-detection to skip that step if disable-output-escaping worked).

Even perfunctory probing shows fairly serious problems in Firefox (where Chromium is consistently much better, in this specific area). I could file quite a few bugs in short order (e.g. these mentioned, bad document.contentType values, <template> not working properly), but I don’t think there’s any interest in fixing things.

(I wrote this comment as much for my own future reference as anything else. XML/HTML polyglot stuff makes things decidedly messy at times.)

kgwxd · on April 6, 2023

> In JSON you can't get to the parent from the child. And walking down a tree is unintuitive, because nodes can be of different types

JSON only competes with XML. XSLT, XPath, and XSD are just as much an afterthought in that they are completely separate from XML and are entirely optional. The engines written around those is where the powers to walk the tree and validate come from, not XML itself. There's a wide range of tools to get the same benefits for JSON sources, and they usually handle XML and other data sources too, because it shouldn't matter. The reason the X* tools have fallen out of favor is because they're unnecessarily tied to a single type of source data.

krzyk · on April 6, 2023

JavaScript is as good as JSON. It has eaten The world jest because it was in every browser. Similarly Chrome, advertised on the biggest search engine.

fatnoah · on April 6, 2023

> I started my career during peak XML crazy and while I liked parts of it at the time, the number of things it was used for was quite insane. I had to maintain a system once where a major part of it was XSLT, when could have just been a simple imperative algo with some config settings.

Same here. XML was going to save the world! Remember XML data islands with data embedded in page source and displayed via XSLT?

The craziest thing I had to build was a tool to manage the dozens to hundreds of XML configuration files that powered our product. The tool allowed editing and deploying the files, complete with validation and even input suggestion based on associated XSD for each XML file.

mgr86 · on April 6, 2023

I remember the XML is everywhere phase. The community that hasn't retired or passed on has largely come off of that. You can return JSON natively from XSLT 3.0 now. I've been on both sides of the love/hate fence with XML, but these days when I have the need to work with it I leave the projects really satisfied.

wvenable · on April 6, 2023

The best part of the XML craze was the deprecation of proprietary binary file formats. I had to work with some software that was truly awful but since it saved it's data as XML,I could skip the software entirely and work directly with the data.

ssdspoimdsjvv · on April 6, 2023

Nowadays you have the same type of tools, only for YAML. Not sure if that's so much better.

pyuser583 · on April 6, 2023

Which tools are you thinking of?

ssdspoimdsjvv · on April 7, 2023

The Github actions editor for example.

datavirtue · on April 6, 2023

YAML can die in a fire

rezaprima · on April 7, 2023

I do just like this with json. Creating api blueprints.

justin66 · on April 6, 2023

I really respect that you provide transcripts. It's terribly important for accessibility and for getting what people have said into the various search engines.

I was sad to hear that Crockford is not aiming to be the author of "the next language" anymore, but I wonder how sincere that really is. His thoughts on actor-based languages are interesting.

adamgordonbell · on April 6, 2023

Thanks!

Crockford's thoughts on actors are really interesting. I tried to pull them apart but I didn't get very far and ended up not including them in the episode.

What he is envisioning is not exactly like Erlang but not exactly like Scheme. He said that Carl Hewitt had a lot of ideas and they were hard to unpack.

If you're interested though, I would reach out to him. He is very approachable and excited to talk to people with ideas for new ways of making things simple.

ryukafalz · on April 6, 2023

I do wish some more details about his current thoughts on actors had been included, though I can understand if they were hard to tease out. I would guess that may go back to his work on Electric Communities as E was a message passing system. Would be interested in hearing more of where his mind is on actors today.

The closest thing we have right now I think is Spritely Goblins, though that is Scheme. (Not coincidentally, one of the other Electric Communities co-founders is also a Spritely Institute co-founder: https://spritely.institute/about/)

justin66 · on April 6, 2023

The design of the language he is (was?) working on is here:

http://www.crockford.com/misty/

There are a few talks from a few months ago on Youtube that detail some of the rationale behind it.

ryukafalz · on April 6, 2023

Oh cool, thanks for the pointers!

hestefisk · on April 6, 2023

Just wanted to say your podcasts are really good. Many podcasters aim for quantity over quality; with you it’s the other way around. Thank you!

meepmorp · on April 6, 2023

I remember a meeting where a consultant from an MCP excitedly told our mutual client that the XP in the upcoming version of Windows stood for 'XML Protocol.'

More innocent times.

lowercased · on April 6, 2023

I had a power strip which had "works with windows 95" on the packaging box.

dylan604 · on April 6, 2023

This is just an example of how marketing knows nothing about the products they are marketing, or are just flat out snake oil sales. Like packaging for bacon exclaiming "gluten free"

print_goto_ten · on April 6, 2023

>This is just an example of how marketing knows nothing about the products they are marketing

Or it is a question that people who are not familiar with technology will ask frequently and makes no sense to people who are.

mgkimsal · on April 6, 2023

I use bing, and tell people to 'bing it' sometimes (instead of 'google it').

I had a family member ask "will bing work with my yahoo?"

So... I know those words individually, but... man... together, in that order... I don't know how to respond. I think I said something like "don't worry about it - it's not that big a deal" and left it at that.

recursive · on April 6, 2023

I mean... but it is? They're not wrong. Either the marketing works or it doesn't, but it's not false. If it works, then it's good marketing. If it doesn't, then the marketers don't know their market.

Joker_vD · on April 6, 2023

Ah yes, "this cereal is asbestos-free!" kind of marketing. It's on the same page with non-GMO salt.

dylan604 · on April 6, 2023

Exactly. It doesn't really matter that the claim is technically true, it just has no bearing whatsoever on the product itself.

Sohcahtoa82 · on April 6, 2023

Wow, and I thought my headphones from 1998 that said "MP3 Ready" on the package were stupid.

cglace · on April 6, 2023

Did it work with 98?

lowercased · on April 6, 2023

I wasn't working at the same place in 1998, so I can't verify.

The hype around Windows 95 at the time was... incredible. Midnight store openings, hours-long line waits... insanity.

kcplate · on April 8, 2023

> The hype around Windows 95 at the time was... incredible.

I left one job for another in 1994 simply because the new company had access to the windows Chicago beta program (and it was a 20% salary bump). But the main reason for me choosing g that org was the beta, because I had other offers with similar salary bumps at the same time.

adamgordonbell · on April 6, 2023

Scala had XML literals as part of the language!

Apparently Philip Wadler was the person who told them needed it, because the future was XML.

( Walder is big Haskell/PL person)

bryik · on April 6, 2023

That's surprising! Wasn't it Philip Wadler who said "The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well."

http://harmful.cat-v.org/software/xml/

taeric · on April 6, 2023

The irony, to me, is how it is viewed as a bit of a mistake that they had XML in Scala. Hard to square that with how instrumental JSX was in getting some of the modern JavaScript frameworks as far as it has.

hajile · on April 6, 2023

If the web spec had a JSON analog for HTML, I think you'd see a lot less love for JSX.

taeric · on April 6, 2023

Maybe? Though, I suspect JSX just got lucky with when it got popular. There were plenty of mixed HTML/Script things in the past. Most of them met a lot of resistance.

tracker1 · on April 6, 2023

ActionScript 3 as well as the E4X support in Mozilla's Browser+JS/XUL engine for a while (removed now afaik).

Around that time it was pretty nice passing around XML, as I was forced to work with VB.Net which also had an XML literal syntax on the backend and Flash/AS3 on the UI.

I had built a POC with E4X that was VERY similar to React/Redux over a decade before React, but the other browser vendors didn't have it... At the time IE and Chrome were shifting towards JSON.

orthoxerox · on April 6, 2023

VB.NET still has them. I remember doing a project with an XML database (ugh) back in college when the version with the literals was released. I was ecstatic.

tuukkah · on April 6, 2023

Was it far from JSX?

chimeracoder · on April 6, 2023

It was almost literally JSX (except with Scala syntax instead of Javascript, of course).

An XML literal in Scala would have been:

  // XML literals (to be dropped)
  val mails1 = for (from, to, heading, body) <- todoList yield
    <message>
      <from>{from}</from><to>{to}</to>
      <heading>{heading}</heading><body>{body}</body>
    </message>
  println(mails1)

This was replaced with "XML string interpolation", which

  // XML string interpolation
  val mails2 = for (from, to, heading, body) <- todoList yield xml"""
    <message>
      <from>${from}</from><to>${to}</to>
      <heading>${heading}</heading><body>${body}</body>
    </message>"""
  println(mails2)

Seanny123 · on April 6, 2023

What are some examples of the "enormous tool stack" required for XML? I ask, because I came into software development after everyone adopted JSON. When I do need to parse XML, there was a library I could use, although I will admit that needing xpath was a bit annoying.

orthoxerox · on April 6, 2023

If your XML is written the way people write JSON, then the stack isn't enormous. But XML is usually wrapped in layers of additional complexity. SOAP envelopes and namespaces they require, XSLT that someone invariably used to write an XML transformer, etc.

XorNot · on April 7, 2023

Also broken parsers. So many broken parsers. Honestly this is what keeps JSON going: the parser was simple because the language was simple, and as a result any JSON you got basically worked the same way.

Not so with XML: all the parsers were insanely complex with the namespacing and whatnot feature support and possible external URLs and everything else...and as a result however no XML library was ever adequate to interface with anything. On multiple occasions generally the best way to build XML for something was to take a working copy, and then glue text together so you would exactly replicate whatever that specific application wanted, rather then trying to use anyone's library for it.

bryik · on April 6, 2023

This was before my time, but I believe the WS-* series of specifications is an example.

> Like with the original J2EE spec, which sought to complicate the basic mechanics of connecting databases via HTML to the internet, this new avalanche of specifications under the WS-* umbrella sought to complicate the basic mechanics of making applications talk to each other over the internet. With such riveting names as WS-SecurityPolicy, WS-Trust, WS-Federation, WS-SecureConversation, and on and on ad nauseam, this monstrosity of complexity mushroomed into a cloud of impenetrable specifications in no time. All seemingly written by and for the same holders of those advanced degrees in enterprisey gibberish.

https://world.hey.com/dhh/they-re-rebuilding-the-death-star-...

WorldMaker · on April 6, 2023

> When I do need to parse XML, there was a library I could use, although I will admit that needing xpath was a bit annoying.

It sounds a bit like someone paved a garden path for you by that point. One of the reasons for the "enormous tool stack" wasn't just depth of tools needed ("tool X feeds tool Y which needs tool Z to process namespace A, but tool B to process namespace C, …"), but also the breadth. I recall there were at least six types of parsers to choose from with all sorts of trade-offs in memory utilization, speed, programming API: a complicated spectrum from forward-only parsers that read a node at a time very quickly but had the memory of a goldfish through to HTML DOM-like parsers that would slowly read an entire XML document all at once and take up a huge amount of memory for their XML DOM but you could query through the DOM beautifully and succinctly. (ETA: Plus or minus if you needed XSD validation at parsing time, and if you wanted the type hints from XSD to build type-safe DOMs, etc.)

A lot of XML history was standards proliferation in the xkcd 927 way: https://xkcd.com/927/

XPath tried to unify a lot of mini-DSLs defined for different DOM-style XML parsers.

XSLT tried to unify a bunch of XML transformation/ETL DSLs.

The things XPath and XSLT were designed to replace lingered for a while after those standards were accepted.

Eventually quite a few garden paths were paved from best practices and accepted "best recommended" standards and greenfield projects start to look easy and a simple number of well-coordinated tools. But do enough legacy Enterprise work and you can find all sorts of wild, brownfield gardens full of multiple competing XML parsers using all sorts of slightly different navigation and transformation tools.

sgtnoodle · on April 6, 2023

The last time I worked with XML, using an external library wasn't really a great option. I ended up writing my own parser in C++. It took about a week to get all the features required for my purpose.

tracker1 · on April 6, 2023

Honestly, if you just need a one-off transformer, VB.Net is probably one of the better options. The .Net XML library is pretty good in the box, and VB.Net has XML literal support on the top... if you just need to read, then C# is a better language imo.

Kwpolska · on April 6, 2023

VB.NET is masochism, and XML literals are a dumb, confusing, and unnecessary feature. Creating XML elements with regular code is quite easy, and at scale, you probably want templates.

gweinberg · on April 6, 2023

XML might not have been so bad, if there weren't dopes pushing SOAP.

sanitycheck · on April 6, 2023

I have huge respect for Doug Crockford, and I never imagined I would disagree with him.

However I think by now we've seen that a lot of that "unnecessary" XML complexity was not, in fact, entirely unnecessary. These days we use JSON for everything, but now we've got JSON Schema, Swagger/OpenAPI, Zod, etc etc. It's not really simpler and there's a lot of manual work - we might as well be using XML, XSD & SOAP/WSDL.

mgr86 · on April 6, 2023

XML is pretty awesome. As a 12 year old in 1998 I remember being enthralled with it and my young mind imagined a lot of possibilities. But alas, I had no real use for it until xhtml, but even then...As the years passed I learned to drive and get the heck off my computer for a few years.

It wasn't until about a decade later when I finally got to use XML "for real". At my academic publishing job. One of my first real projects was having a set of academics analyze documents in a web application I built. Prior to that they were analyzing them by hand, were converted to SGML somewhere in Korea, and we would use omnimark to move them to XML and eventually a library application.

The XML community, the one's who haven't retired or passed on, have been more welcoming of the competition too. They went from XML is everywhere, to being able to return JSON from an XSLT. I am in a small shop, and so I wear many hats. But I am always satisfied when I get to work with XML, or craft an xsl/xq script that does exactly what I need. Additionally, the community as a whole is very helpful, and a bit more grey. Meaning, they are less likely to fall for trends and bullshit.

A bit disjointed, but ,in short, XML is awesome. Now only if they would move Balisage back to Montreal. I'm no fan of DC or virtual conferences.

PurpleRamen · on April 6, 2023

With XML, the complexity is the baseline, and it only goes up from there. With JSON, the complexity is just an option, the baseline is pretty simple. Also, good XML-tools are rare or expensive.

andyjohnson0 · on April 6, 2023

Baseline for XML would be a document that doesn't use schemas, namespaces, attributes, or any of the SGML legacy stuff like DTDs and PCDATA.

Such a document is essentially as simple as the equivalent JSON.

hot_gril · on April 6, 2023

Even that is more complicated than JSON.

andyjohnson0 · on April 6, 2023

Care to elaborate?

hot_gril · on April 6, 2023

Every key is written twice, for opening and closing. Keys can be duplicated, and in fact that's what you have to do if you want a simple list. There aren't numeric types, so you have to parse strings. It also looks horrible.

  <cds>
    <cd><title>Led Zeppelin II</title><artist>Led Zeppelin</artist><price>999</price></cd>
    <cd><title>La Brise<title><artist>Arax</artist><price>999</price></cd>
  </cds>

or

  <cds>
    <cd>
      <title>Led Zeppelin II</title>
      <artist>Led Zeppelin</artist>
      <price>999</price>
    </cd>
    <cd>
      <title>La Brise<title>
      <artist>Arax</artist>
      <price>999</price>
    </cd>
  </cds>

vs something like

  [
    {"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
    {"title": "La Brise", "artist": "Arax", "price": 999},
  ]

You can probably do better using XML attributes. But then you're using more features.

taeric · on April 6, 2023

If we are complaining about the closing tags, might as well add that embedding newlines or quotes into JSON is less than pleasant.

Which is to say, this feels a touch of a non-issue. Yes, writing it by hand can get tedious, but that is true of any and every format. Is why you will almost certainly reach for other formats if doing a long list of data. And each and every one of them will fail for some form of input in ways that is frustrating.

hot_gril · on April 6, 2023

Writing that JSON example by hand wasn't tedious. The XML example was, and the result is unreadable. It's important to be able to debug things easily. I'm going to manually type JSON when I'm testing an API, and I'm going to read the response.

If you absolutely don't care about human interface, no reason to use XML either. It's meant to be more verbose. The XML tags will often dominate the size of the payload with things like `<question>Who</question>`, so you have to start thinking about shorter names. Yes JSON has a similar problem, but at least it's halved and you don't have to instruct everyone to call each list element "e". If you super care about size, you'll use protobufs or something.

emn13 · on April 6, 2023

<question>Who</question> "question":"Who",

To me, this does not seem like a win that's worth much, especially since it's likely to shrink considerably even with naive fast compression.

Furthermore, as messages grow in size, the explicitly named closing tag actually kind of starts helping.

Both of these syntaxes have their annoying quirks, for sure, and I understand you really dislike the closing tag; that clearly doesn't bother many people.

But regardless of personal preference, I'm really skeptical any of this really explains json's relentless path to replace (most) xml. Other reasons, such as the extreme wordiness some xml apis chose, the poor implementation of namespaces, the problems with embedding arbitrary data (in particularly control characters), the inconsistency between attributes and elements, the lack of support for numbers, the lack of (conventional) support for key-value pairs - all of these surely played a much greater role than a fairly limited syntax issue.

And it's not even like json is without impractical quirks; lack of comments, the ban on trailing commas, and the need for quotes in object-keys spring to mind. Yet those don't mean json is likely to die out soon - even though even javascript itself from which it is derived doesn't suffer from those (anymore)!

taeric · on April 6, 2023

Not wrong, but also probably not really indicative of problems or actual use. And while I will be manually typing some data to go into an api for testing, I'm far more likely to by typing it in a thing that is was looser in what it accepts than a json document. Literally today just using dicts in python. And even then, my debugging is dominated by mistakes in data entry there.

Also, I see you took it to be a full on defense of XML. I did not really intend it that way. I think both can be fine. And insisting on either is likely a mistake.

I do find your nitpicks here amusing, still. Size of tag is just as obnoxious as size of key. And, though it can dominate the textual representation, there are clear ways to reduce that. Even knowing that BSON and Binary XML exist, though, I'd be hard pressed to say any project that failed because they weren't using them.

hot_gril · on April 6, 2023

JSON vs XML isn't going to make or break your project. But why would you use XML for data interchange. It makes sense for things like HTML where you're writing a document, but otherwise, it's usually just a needless burden.

Like, if I were there when XMPP was created, yes I would have insisted on JSON. XML was a plainly bad choice. Edit: Oh, JSON didn't exist until a little later. Maybe something similar did.

taeric · on April 6, 2023

I mostly agree. I do think Jupyter choose wrong by picking JSON for their documents. They are literally marked up source documents.

XML does have the "benefit" of being a bit more extensible than JSON. Specifically, being able to have namespaced elements in there does make some sense on paper. For example, you could have two extensions both add in data using the same keys, but different namespace. Can't really do that with JSON.

In practice, I think it just fell flat due to way too much "forethought" in things they anticipated people wanting.

hot_gril · on April 6, 2023

Yes, XML is probably a good fit for something like Jupyter. Basically if you want to reuse a lot of "objects" throughout a structure and have the mean the same thing in different nested parts of it. Like how <a> in HTML means a hyperlink whether it's under <body> or some nested <div>.

taeric · on April 6, 2023

I'd phrase it more that there is a document with mixed use items marked up throughout it. Some items in the document are code, in which case you probably want to fence the code with a marker on what language is used. Other items are just prose, in which case you'd like to just write the prose as much as you can.

Some items can even be other forms of xml that have their own schemas dictating what is valid. (Thinking SVG here.)

I'll also note that even there, I can see why HTML went with the odd parsing they do. XHMTL tried going with "well formed" documents, but that falls flat for the authors. Is why "sections" of a document are essentially just collecting all of the "h" tags and making an implied tree out of that. As opposed to making the tree directly. To that end, my markup language of choice for Jupyter style things is org-mode in emacs. Yes, it has some warts; but again, all formats that I have ever seen have warts.

Edit: I want to add that I don't intend this as a "correction." I should say that I agree with your post. Complicated field where I doubt I'd have done better than most others. :)

goto11 · on April 7, 2023

Yeah, now express this in JSON:

   <div>
     <p>JSON example:</p>
     <pre>
      [
        {"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
        {"title": "La Brise", "artist": "Arax", "price": 999},
      ]
     </pre>
     <p>Source: <a href="https://news.ycombinator.com/item?id=35472014">click here</a>!</p>
   </div>

JSON is great for a certain domains, but there are other domains where it is a nightmare and XML shines.

Use the right tool for the job.

preseinger · on April 6, 2023

you can't ignore ux stuff like this in a protocol that's meant for general use

something like duplicating info in closing tags in XML (which applies to every element) isn't really comparable to stuff like having to escape certain characters in JSON strings (which applies only to the values use those things)

perfect is the enemy of the good, and the good is the metric

hot_gril · on April 6, 2023

Don't you also have to escape stuff in XML? Like &gt, which is even worse.

taeric · on April 6, 2023

Yes, though many languages have lenient parsers. Most browser parsers, for example, will probably only be lenient if parsing "HTML."

    new XMLSerializer().serializeToString(new DOMParser().parseFromString("<a>hello < </a>", "text/html"))

The above in my console does as expected there. And again, entities are a very dangerous part of XML and friends.

You are correct that if you tell it that that is xml, the browser will throw it back at you. Just as the JSON parser will barf on JSON.parse("{'test':'value'}").

preseinger · on April 6, 2023

per specifications, json parsing is not lenient, html parsing is lenient

taeric · on April 6, 2023

Right, and amusingly, more than a few json parsers are very lenient in this. That or folks abandon ship fairly quickly and go for another spec that is far more friendly.

preseinger · on April 7, 2023

well json definitely does not accept `{'test':'value'}` as valid input

any parser that behaves otherwise is pretty clearly buggy

json has many problems but parsing ambiguity is not really one of them

taeric · on April 7, 2023

Me thinks you have never looked at the field. I'd as soon declare csv is an error free format. Only true if you ignore the proliferation of applications that get it wrong. In subtle ways, often. Still wrong.

preseinger · on April 7, 2023

csv is wildly ambiguous, to the frustration of ~every data science engineer in industry

json is not

show me an application that parses `{'a':'b'}` as valid JSON, i'm actually interested, probably there are some which exist, but there is no ambiguity about those applications being wrong

taeric · on April 7, 2023

https://news.ycombinator.com/item?id=12796556

preseinger · on April 7, 2023

fun doc! it lists many of the undefined behaviors of the spec, and many of the problems in common parsers

afaict none of them permit keys or value strings to be expressed with single quotes

taeric · on April 7, 2023

Apologies for the, in retrospect, somewhat lazy posting of an article with no comment. I thought that article had a section about how many of them allow single quotes if you don't "enable strict." I am not seeing it on review, though; so either I made that up in my mind, or I'm remembering another article. Either way, apologies.

I did find https://github.com/json5/json5 no a quick search that basically says what I asserted about people just jumping to another standard for things that you hand write. I was probably also thinking heavily about python's dict syntax. (And I confess, I still don't know when to use single versus double quotes in python...)

preseinger · on April 8, 2023

no worries mate

goto11 · on April 7, 2023

To be pedantic, html parsing is not lenient, it is unambiguously specified.

preseinger · on April 9, 2023

if that were true then browsers would refuse to render text/html responses that didn't include a closing </html> tag, i guess

goto11 · on April 9, 2023

No, because the closing </html> tag can be omitted according to the current HTML spec. See https://html.spec.whatwg.org/#optional-tags

preseinger · on April 12, 2023

this is exactly my point

html is not precisely defined

goto11 · on April 24, 2023

Sorry I don't understand your argument. HTML is fully and unambiguously defined, as you can see if you follow the link. Some tags are optional in certain contexts, but this is also precisely defined.

4rt · on April 23, 2023

I think you're missing the point that it is defined, the current html5 spec says that <title> implies the existence of <head>, <body> implies the end of <head>, body tags imply the end of <head> and the start of <body> etc.

HTML5 is not XHTML.

<!DOCTYPE html> <title>Title <h1>Heading

expands to

<!DOCTYPE html> <head><title>Title</title></head> <body><h1>Heading</h1></body>

preseinger · on April 23, 2023

if `<title>A <h1>Heading` is equivalent to `<head><title>A</title></head> <body><h1>Heading</h1></body>` then this means the language is not precisely defined

andyjohnson0 · on April 6, 2023

Thanks. I get your point about the close element including the tag name - but that's the kind of detail I leave to the serialisation library, in the same way that the close scope token in json is different to the start scope token.

As for "looks horrible"... well yeah, I always feel that xml looks "spikey" somehow. But I've been programming in curly-brace languages for 30+ years and I still find json harder to read than xml: I think my brain tries to interpret it as code, not data. I find xml easier to read (even when its unformatted) precisely because the close-tokens kind of document what element they're closing.

Each to their own I guess. At least we're not stuck using ASN1.

hot_gril · on April 6, 2023

> At least we're not stuck using ASN1.

Prepare for trouble, and make it double: http://xml.coverpages.org/dstc-xer2.html

datavirtue · on April 6, 2023

And if someone is nice enough to stuff a NUL in the document, it all shatters.

kitsunesoba · on April 6, 2023

Also this may just be the time in which I got into programming showing, but it seems like JSON encoding/decoding has been built into more languages than support for XML ever was. That's one less required dependency and thing to have to think about in many cases, like in Swift projects all I have to do is make sure my model structs/classes conform to Codable and I'm ready to hit endpoints.

hajile · on April 6, 2023

That’s because writing a JSON parser is pretty straight forward with just a couple edge cases.

Writing a conformant XML parser is a HUGE undertaking comparison.

I could get most places to give me the time to write a JSON parser in whatever language of it didn’t have one. I couldn’t do that with XML.

Because of this, every common language (and most uncommon ones) has a JSON parser while XML parsers are less common (and fully conformant ones are even more rare).

adolph · on April 6, 2023

Here to say this too. Compositional complexity is an advantage.

As a human in a repl, I appreciate the balance of readibility between XML which uses a larger set of syntactical characters, and YAML which uses fewer.

I also appreciate JSON's ontological simplicity over XML. This primarily boils down to the lack of attribute nodes and explicit difference between objects (lists of key-values) and arrays (lists of values).

djedr · on April 6, 2023

> With XML, the complexity is the baseline, and it only goes up from there. With JSON, the complexity is just an option, the baseline is pretty simple.

Very well put. And we could lower the baseline substantially towards simplicity, even from JSON.

It's pretty clear that a lot of people think this way. Some even seriously try to figure out what such a baseline of simplicity would look like.

There are lots of simple indentation-based designs (similar to YAML) such as NestedText[0], Tree Notation[1], StrictYAML[2], or even @Kuyawa's Dixy[3] linked in this thread.

There seem to be less new ideas based around nested brackets, the way S-expressions are. Over the years, I have developed a few in this space, most notably Jevko[4]. If there ever will be another lowering of the simplicity baseline, I believe something like Jevko is the most sensible next step.

[0] https://nestedtext.org/en/stable/ [1] https://treenotation.org/ [2] https://hitchdev.com/strictyaml/ [3] https://news.ycombinator.com/item?id=35469643 [4] https://jevko.org/

pointlessone · on April 6, 2023

I guess, it depends on how you define XML baseline. You can have a very simple XML with only bare tags. It will work just fine. Arguably, it's even simpler than JSON that way. A basic parser for that it probably not more complex than a JSON parser.

All the optional complexity that can go on top, though, is probably better specified for XML. Transformation is well defined for XML (XSLT) but not at all for JSON (I guess, you write your own code to manipulate native objects).

Schemas are basically a native feature for XML. Not so much for JSON.

All sorts of specialised vocabularies are defined for XML. A few are defined for JSON, too.

jsmith45 · on April 6, 2023

For a lot of XML you need to be able to support XML namespacing, and doing that adds a lot of complexity over the original pure XML.

At first XML namespacing sounds simple. Each tag and attribute will have an optional uri attached to it, no big deal right?

From reading through the specification one could be forgiven from assuming that the prefixes are just arbitrary mappings that a processor can ignore, or automatically remap to alternate prefixes.

For example, it is true that <abc:a xmlns:abc="https://example.com/xyz" xmlns:def="https://example.com/xyz"><def:b>5</def:b></abc:a> (notice both namespaces are the same url) is equivalent to: <a xmlns="https://example.com/xyz"><b>5</b></a>.

Unfortunately, the data model also allows for content to reference the namespaces by prefix, and therefore every general xml processor that supports namespaces must keep around an application accessible mapping from the prefixes to namespaces, as the application may need to be able to access that information to interpret attributes or content. The only exception to this would be if the general XML processor insisted on having schema information for every namespace it might come across. In that scenario it would be able to tell if an attribute value of "abc:b" is really a string literal, or a reference to a namespace identifier (QNAME data type), where the namespace is whatever the current "abc" prefix is bound to, and the identifier is "b".

But obviously we don't want to add full schema support for a simple implementation, so we need to keep the mapping information around, just in case the application needs it. We also cannot easily offer nice features like changing a document to use preferred prefixes for certain namespaces, unless we also keep any prefixes that are used in values that could be interpreted as QNAMES, just in case they actually are, but our processor does not know, because it has omitted schema support for simplicity (or perhaps it included schema support, but does not have a schema available for some namespace).

And that is just the complexity that stems from one fairly small quirk in how XML works.

You also have no idea if an element content needs to preserve whitespace or not if you don't know the schema, and don't happen to have an xml:whitespace attribute present. Thus if you want to re-indent arbitrary xml for readability safely you could end up with something like this:

    <abc
        ><def
            >5</def
        ></abc
    >

pointlessone · on April 6, 2023

I understand what you're getting at but that is you choosing higher complexity baseline. Yes, it's a part of a standard but you can chose not to support it. No one said you have to support all of XML-verse in order to use it effectively in your particular application. The most common cases are usable without any of it. Look at most RSS/Atom feeds, XHTML, SVG. They all can get by with simple tags and and attributes.

I'm just not buying the argument that XML's complexity is somehow remediated in JSON. JSON becomes as horrible as XML when you bring it up to feature parity. And that's when there's a way to match features. Whatever people say about XSLT, it is powerful, reasonably well defined, and generic over all documents (even though complex). There's nothing like it for JSON I know of.

goto11 · on April 6, 2023

If we are going for simplicity, surely S-expressions wins? You can support structures similar to JSON or XML on top of it, but the baseline is simpler.

mastax · on April 6, 2023

The new KiCad file formats are all S-expression based[0], except for the project files which are JSON IIRC. I think it works pretty well for representing a tree of typed objects textually. They don't even have any LISP connections. Haven't seen S-expressions used anywhere else, though.

[0]: https://dev-docs.kicad.org/en/file-formats/sexpr-intro/

twoodfin · on April 6, 2023

I’d speculate that human minds and memories work much better with associative structures rather than sequenced ones. JSON draws a clean separation between these two and as a result has clearer syntax for the former.

ie, the benefits of simplicity have a limit.

davemp · on April 6, 2023

I don’t write many APIs but every JSON schema I’ve created had been automatically generated by openapi tools. Even then I’ve found schemas of very little use, because everything gets validated on deserialization anyways. Client side validation usually already taken care of in practice because users should be serializing using the same type library that deserializes or reading the docs very thoroughly.

JSON is so much more ergonomic than XML as the lingua franca because I can actually read it. That being said I still have my share of problems with JSON.

radicalbyte · on April 6, 2023

That was the cause of the XML problems - everything was generated.

Me? Schemas are a requirement in areas where you need to integrate over different technology / with different implementations. JSON Schema is in those contexts a bit of a kids toy compared to what XML can do.

halostatue · on April 6, 2023

We’re using Prisma (https://prisma.io) schemas for a particular data exchange project we’re doing so that we can generate JSON schemas, SQLite schemas, PostgreSQL schemas, etc. We have even found a generator to create basic Elixir code from the Prisma schemas.

We’re not using anything else from Prisma, but if we had to implement something else in JS to talk to a database, that would be a contender for our database interface layer (there are only a couple of others that are even remotely usable, having suffered through the disaster of a Sequelize implementation). We’re more likely to use Elixir and Ecto.

taeric · on April 6, 2023

Adding to the problems of generated schemas, Microsoft and Sun both had different views on how they should be generated. I bought into the promise of "build a wsdl" and you can get clients from .NET and Java. I lost all of that buy in. Hard.

I don't know that I can lay the blame on either one of them directly, mind. But the industry definitely suffered from the bad faith cooperation of those companies.

radicalbyte · on April 6, 2023

Microsoft, Sun, IBM, HP, Oracle et al explicitly made WDSL and related technologies not interoperate... and that is where JSON + universe has been a delight.

taeric · on April 6, 2023

Totally fair. Not sure why I limited my memory to just the two companies.

I'm not clear on how JSON as a format has helped interaction. I'm reminded of like efforts to standardize how information is stored on pages. By and large, that ship sailed and sites that have remained somewhat stable have driven how we look for information on them. All without having to add new schema languages or tools.

hot_gril · on April 6, 2023

I can still read the generated JSON.

js2 · on April 6, 2023

> because everything gets validated on deserialization anyways

First, it really depends what you're deserializing with. There is a lot of code out there that just does JSON.parse and then starts accessing the data and then you have an "undefined" get passed deep into the call stack where maybe it explodes or maybe the program just misbehaves. So if you're using a language like JavaScript or Python, then a JSON schema can be used to validate input right away. Think of it like enforcing a pre-condition.

It's also useful in cases where JSON is being used for configuration files. At my company we have quite a few places where JSON files checked-in to a git repo are our source-of-truth which then get POST'ed to an API. We can enforce the schema of those files using pre-commit hooks so no one even wastes time opening a PR that will fail to POST to the API. The same JSON schema is also used by the API to ensure the POST'ed data is correct.

davemp · on April 6, 2023

> First, it really depends what you're deserializing with. There is a lot of code out there that just does JSON.parse and then starts accessing the data and then you have an "undefined" get passed deep into the call stack where maybe it explodes or maybe the program just misbehaves.

I disagree, this example is just sloppy programming. Passing unvalidated data deep into a program is bad, I'm not arguing for that. What I'm saying is that you should be converting your unvalidated serialized data into a structured type right on the edge. Your data type/type system should __be__ your schema/validator.

> So if you're using a language like JavaScript or Python, then a JSON schema can be used to validate input right away. Think of it like enforcing a pre-condition.

This is what I do with python+pydantic:

    @dataclass
    class Foo:
        bar: int

    foo = Foo(**json.loads(json_buff))

I'm not the biggest fan of pydantic here because you'll have to handle an exception for invalid data instead of an Option or Result in a better type system. But w/e.

> It's also useful in cases where JSON is being used for configuration files. At my company we have quite a few places where JSON files checked-in to a git repo are our source-of-truth which then get POST'ed to an API. We can enforce the schema of those files using pre-commit hooks so no one even wastes time opening a PR that will fail to POST to the API. The same JSON schema is also used by the API to ensure the POST'ed data is correct.

You can easily do with serdes and a type library as well.

---

I guess schemas may be useful for crossing language boundaries, but you're going to need language specific types/objects at some point so why use schemas directly even then? (I think gRPC may have code gen tools for this purpose).

adamc · on April 6, 2023

JSON is great, but I surely wish it supported comments. That's the nature of its failings: too minimal.

wongarsu · on April 6, 2023

That depends on what you want it to be. For a data interchange format, having no comments is arguably a strength. For a config file format, having no comments is a big weakness.

hot_gril · on April 6, 2023

Just do

  {
    "someSetting": true
    "comment": "TODO change to false when ready"
  }

Though really text-based protobufs are better for config.

sixbrx · on April 7, 2023

Problems are that some tools will rewrite the file and reorder the "comment" away from what it's meant to comment on. Also might complain the "comment" item isn't expected there. I seem to remember package.json suffering from both of these under the control of npm.

hot_gril · on April 7, 2023

Yeah it's definitely a limited solution, and I prefer that JSON be kept simple that way. Many package.json-adjacent configs like the Babel stuff can be in .js files, giving you comments and everything.

sgtnoodle · on April 6, 2023

Many people use text format protobufs for config. It vaguely resembles yaml, and the schema is enforced by virtue of requiring a message definition in order to parse it.

keneda7 · on April 6, 2023

This always bothered me. A coworker once suggested using fields ending in 'notes' to put in comments but I never really warmed up to that.

rqtwteye · on April 6, 2023

I have heard that too but it’s just a terrible idea.

beached_whale · on April 6, 2023

Luckily a good number of parsers support extensions to JSON like comments and trailing comma's.

PurpleRamen · on April 6, 2023

Comments are simple to parse, but preserving them on the dump is complex. I guess they were sacrificed for the simplicity.

Marazan · on April 6, 2023

They were excluded so people wouldn't use them to insert meta-processing instructions into the JSON doc.

In reality people insert those meta-processing instructions in other ways.

rqtwteye · on April 6, 2023

That’s a good point. It would be hard to read the JSON, modify it and then write back with comments.

But you still should have the option to at least ignore them while reading. That would make JSON config files so much better to work with.

maxloh · on April 6, 2023

You can use JSONC, which is JOSN with C style comments.

bilalq · on April 6, 2023

The ambiguity difference around lists alone makes JSON over XML compelling.

It is simpler than XML/XSD. Without the schema, you never know if a certain element should be treated as being part of a list or not. When interoperating with anything other than XML, that matters.

wil421 · on April 6, 2023

I dislike SOAP and avoid working with it when I can. However, the WSDL is an excellent part of SOAP that really makes it easier to work with. Teams tended to over engineer their APIs and all kinds of cruft would develop. I like the HTTP operations with REST.

I can remember hardcoding and manipulating a bunch of non-sense legacy fields just to get a ticket created via their SOAP enterprise service bus. Not to mention all the operations that made no clear sense.

foolfoolz · on April 6, 2023

soap may have taken liberties with http to get its work done (graphql: so what??) but it really felt like we reinvented the wheel. i was consuming massive wsdls in 2013/2014 and i consume massive open api specs in 2023. did anything actually improve?

tracker1 · on April 6, 2023

Unless the implementation defines too many things as just "Object" and you're consuming from a stronger typed language... and the generated library doesn't give you anything resembling a real interface. I've used a dynamic language (Node) a few times to bridge such wsdl/soap services to consume from C# and similar.

Consuming SOAP/WSDL from languages other than the one it's published in isn't fun. Man, some of the PHP implementations were beyond horrible... well defined REST/RPC +_JSON is generally much easier in the end.

haburka · on April 6, 2023

> I dislike SOAP and avoid working with it when I can.

I disagree. I think personal hygiene is very important for in-office coworking.

tracker1 · on April 6, 2023

> I dislike SOAP and avoid working with it when I can.

Well, I'm about to take a shower now, and shame on you.

agumonkey · on April 6, 2023

What I appreciate compared to xml is:

  - generic concepts like arrays and maps
  - lack of opportunity to invent names

Every xml schema is a potential DSL that reinvents things they might now.

Other than that it's true that the xml era was just addressing a lot of important stuff early, I guess it was only compatible with big corp mindset and not early web dynamic / fluid / small scale apps. (a bit like how PHP started to write PSR to avoir dynamic code / effects in libs .. formalization etc.

halostatue · on April 6, 2023

Every JSON schema is also a potential DSL that reinvents everything. Yes, there seems to be some convergence on things, but object arrays in XML aren’t really any more complex than object arrays in JSON — there just might be multiple ways to represent them.

For this JSON:

    {
      "part_numbers": [1, 2, 3, 4, 5]
    }

You have two main ways to represent these in XML:

    <!-- repetition = array -->
    <order>
      <part_number>1</part_number>
      <part_number>2</part_number>
      <part_number>3</part_number>
      <part_number>4</part_number>
      <part_number>5</part_number>
    </order>

    <!-- wrapped repetition -->
    <order>
      <part_numbers>
        <part_number>1</part_number>
        <part_number>2</part_number>
        <part_number>3</part_number>
        <part_number>4</part_number>
        <part_number>5</part_number>
      </part_numbers>
    </order>

Is this better than JSON? No, not particularly. But it’s no less clear than the JSON, and it compresses pretty well (it compresses better for larger documents, obviously).

The larger problem with XML is that the tooling is often lacking outside of Java and C#/.NET and none of the tooling is well-built for the sort of streaming manipulation that `jq` does (it exists, but IMO one of the least usable ideas from the XML camp is XSLT), and JSON support is pretty universal everywhere, even if the advanced things like JSONpath and JSON Schema aren’t.

I also think that there’s a problem when you have to choose between SAX and DOM parsing early in your process. Most JSON usage is the equivalent of using a DOM parser because the objects are expected to be relatively small, but many XML systems are built for much larger documents, and therefore need to parse the stream because the memory use otherwise would be unacceptable. The use of a JSON streaming parser is much rarer, IME.

treis · on April 6, 2023

Where XML shines is when you pass more complex data types than numbers and strings. If you repeated your example for an array of dates, as an example, strictly speaking you can't even generate the JSON. We'd first have to agree on what string representation of a date we want to use. For XML it's built into the spec.

bmarkovic · on April 7, 2023

In JSON the de facto standard for datetime is (because of JavaScript) very much the Unix msec timestamp (which is always in UTC) so while it's not hardcoded in spec you basically need to be an idiot not to do it like that, and removes one huge headache of XML dates which is timezones.

halostatue · on April 7, 2023

I don’t think that I’ve ever seen msec timestamps passed around because JSON numbers are floats, which means that there’s a limit to the precision available (which is to imply as well that currency amounts should be passed as decimal strings in JSON for safety as well).

Suggesting that msec timestamps resolves timezone issues is naïve at best, because anytime you are passing something that refers to a real time (that is, it is significant to humans) rather than an instant time (that is, it is something like an event log timestamp), you are dealing with time in a particular place, which has human impact — cultural, legal, linguistic.

Passing around timestamps as RFC3339 UTC strings with timezone names and offsets (much like one should be doing in databases) is what would be recommended for real (human) times.

dragonwriter · on April 6, 2023

Okay, so the point at which you need to adopt a schema language in toy examples is earlier with JSON, but in most practical cases you’ll want to do that in either JSON or XML (because, even if you are only using built-in types, you’ll still want to communicate the shape), so this objection is kind of meaningless.

treis · on April 6, 2023

Well, no. Because JSON & by extension OpenAPI lack a Date type you can't easily add validations about dates to those schemas. Like you can't say this particular date must be in the past in an OpenAPI spec because it has no concept of a date. The best you can do is a regex on the strings you call dates but that falls apart pretty quick.

Sohcahtoa82 · on April 6, 2023

> Like you can't say this particular date must be in the past in an OpenAPI spec because it has no concept of a date.

I don't get these types of arguments.

There's zero reason you can't write code that parses a date in an expected format (and throws an error if the date is formatted incorrectly) and then checks that the date is in the past.

Yes, it does mean you'll spend time writing more code (You know, the job you're being paid to do?), and it would be nice if your data format supported such automatic checking functionality out of the box, but to say "It can't be done!" is just plain silly.

treis · on April 6, 2023

> "It can't be done!" is just plain silly

It's a good thing I didn't say that.

>Yes, it does mean you'll spend time writing more code

The whole point of WSDLs and OpenAPI is to minimize the amount of time it takes to consume your API. Saying you have to write more code is highlighting the shortcomings of OpenAPI at doing the only thing it's built to do. Which is why companies have largely punted on providing OpenAPI specs in favor of maintaining libraries in a handful of popular languages.

aidenn0 · on April 6, 2023

I have literally never used any of these things.

The hate I have for XML is the high markup overhead. Anybody who has configured a trunk of the century product with XML config files knows what I mean; the screen is usually 2/3 XML tags, which means 1/3 closing tags, which add nothing semantically

irrational · on April 6, 2023

> but now we've got JSON Schema, Swagger/OpenAPI, Zod, etc etc. It's not really simpler and there's a lot of manual work - we might as well be using XML, XSD & SOAP/WSDL.

Uh... do we? I've never used any of those. Plain JSON has always worked fine for me.

bryik · on April 6, 2023

> but now we've got JSON Schema, Swagger/OpenAPI, Zod, etc etc.

You don't have to use any of those.

halostatue · on April 6, 2023

You don’t have to use anything for XML, either. The simplest XML document is almost indistinguishable from the simplest JSON document. Nothing in XML requires XML schemas or namespaces or anything else that is usually attributed to the complexities of XML.

dylan604 · on April 6, 2023

I have to say that I was bit disappointed the first time I learned about JSON Schema. My immediate reaction was to wonder if they were trying to become XML.

throwawaymaths · on April 7, 2023

OpenAPI is complex not because of JSON, but because it's a nearly complete description of http.

steve1977 · on April 6, 2023

… and have proper comments.

simonw · on April 6, 2023

My favourite Douglas Crockford quite, from a debate back in 2006 about why JSON was reinventing the wheel when XML already existed:

> The good thing about reinventing the wheel is that you can get a round one.

https://simonwillison.net/2006/Dec/21/crock/

taeric · on April 6, 2023

> Turned out JavaScript was the first language to give us lambdas, and that was an amazing breakthrough.

I mean... with charity I can see the context and get it. But. What!?

Overall fun read through history, even if definitely from Doug's perspective only. (As evidence by JavaScript being an originator of lambdas...) I do find the idea that JSON was as novel as history says it was kind of odd. I remember inlining javascript objects years before "JSON" was a thing. Making it a subset of what javascript could already do seems straight forward and a good execution. Getting rid of comments feels asinine to me. (I'll also note that the plethora of behaviors you get from JSON parsers shows that it is effectively CSV. Sure, there may be a "standard" out there, but by and large it is a duck typed one.)

I'm also a bit on the camp that XML is better than JSON. Being able to have better datatypes, for a start. Schemas that allow autocompletion. Is also easier to see as a markup language (per the name). That said, they clearly went too far with entities and despite making sense for markup, attributes versus children are more than a touch awkward.

I also recall that what killed XML and WSDL files in general, was the complete shit show that was getting a single document to work with both MS and non-MS clients.

goatlover · on April 6, 2023

Crockford mentions Scheme right before that, so he's aware lambdas originated with Lisp, presumably. I guess he means JS was the first mainstream language to popularize them?

taeric · on April 6, 2023

Yeah, that is why I think I can see the point with charity to the discussion. Still an awkward proclamation. Many people were coding with LISPs for a long time before javascript came onto the scene. And I don't think LISPs were the only language with lambdas?

slaymaker1907 · on April 6, 2023

The current XML standard is hot garbage since it completely disallows null characters even via "" despite most languages now supporting nulls in the middle of strings. Also, JSON definitely allows schemas, primarily through the JSON schema standard, but I've also seen TypeScript notation used for this as well which has the convenience of being readable by more people (I strongly suspect more people know TypeScript than know either XML schemas or JSON schemas combined).

taeric · on April 6, 2023

JSON is garbage to read largely due to how much needs escaping. This is largely fine for smaller documents, but there is a reason yaml and toml both gained traction over raw json for config files.

And I don't make any real defense of some of the darker corners of XML. In particular, I already criticized entities being a bit too much. Namespaces are also something that, while I can see the desire, the implementation is way too much for most of us.

JSON schema is going to be cursed for a long time. Just the odd treatment of it will be a problem. (In particular, that it is a subset of the numbers that javascript itself supports is... awkward.)

I also confess, though; that I'm not clear why I would want a null in the middle of a string? That feels like a gun loaded and aimed squarely at a foot.

slaymaker1907 · on April 6, 2023

Most languages (C#, Java, Rust, JavaScript, etc.) support nulls in the middle of strings so it can be a security vulnerability if you try to serialize untrusted input to XML. I'd much rather be able to encode anything my input language considers a string and deal with excessive escaping than need to worry about what I'm going to do with inputs that my serialization language cannot support.

taeric · on April 6, 2023

I'm curious what the vulnerability is? Also not clear what the null character is. Any links I can follow?

And again, if this is your line in the sand, how do you serialize NaN and Infinity in JSON?

Edit: Playing with this a bit, I'd actually assume that allowing \0 would be a vulnerability. I was curious how browsers treat it, so I see that parsing to an html document seems to just drop the characters? Fun little rabbit hole to jump in!

slaymaker1907 · on April 7, 2023

Yeah, that's why I consider it to be a breeding ground for vulnerabilities. People will probably just assume the XML serializer can handle any strings in their language of choice and not handle those edge cases. What I ended up doing for my use case was to encode nulls as "" but within a CDATA section so it was interpreted literally (choosing ambiguity over omission). The best way would probably be to have some sort of spell <null /> element, but there isn't such a thing within the standard. There asi:nil, but that is really indicating something else.

taeric · on April 8, 2023

But what is the vulnerability? And what is a null character doing in a text document?

If you are just worried about data loss, having null allowed in text segments is already begging for failure, as C programs will almost certainly get them wrong.

If you are transferring binary, base64 or similar will already cover you.

And again, if this is a strike on xml, how do you represent NaN in a JSON document? Do what DynamoDB does and wrap all numbers in quotes?

Zamicol · on April 6, 2023

From another interview:

>The best thing we can do today to JavaScript is to retire it. Twenty years ago, I was one of the few advocates for JavaScript. Its cobbling together of nested functions and dynamic objects was brilliant. I spent a decade trying to correct its flaws. I had a minor success with ES5. But since then, there has been strong interest in further bloating the language instead of making it better. So JavaScript, like the other dinosaur languages, has become a barrier to progress. We should be focused on the next language, which should look more like E than like JavaScript.

- https://evrone.com/douglas-crockford-interview

One of the traits that makes Douglas great is being willing to say the obvious even if it is politically unpopular.

n0w · on April 6, 2023

Oh, hey. That's cool. I hadn't realised Douglas Crawford worked on E. I haven't actually looked but I wonder who else participated?

E had some really cool ideas, it's sad that it doesn't seem to be that well known!

irrational · on April 6, 2023

The biggest impedances I see to replacing JS are:

1. You've got to keep JS around for backwards compatibility for the billions of websites already using it.

2. You will need to two engine teams, one to maintain JS and one for the new language.

3. Now you have a whole new vector for security issues. You've made the threat surface much broader. So, you will probably need to hire additional people.

4. You need to coordinate with all the other browser makers so everyone rolls out their new engines more or less concurrently. Other than experiments, nobody is going to start using it unless it works on all the major browsers and platforms.

hajile · on April 7, 2023

That depends on the language you choose.

If we went to a scheme dialect as originally intended, we could have just ONE language for all the things.

Legacy JS? Just compile it into Scheme and run it.

HTML? Use S-expressions and support legacy HTML syntax by compiling it into them. Now you get all the power people want from template languages, but baked right into main language itself.

CSS? No more weirdness like adding sin() or calc() to make up for shortcomings. Once again, you get the power of the full Scheme language right there.

acabal · on April 6, 2023

While both are good fits for their specific use cases, I think JSON won as an medium of exchange because unlike XML, JSON is dead simple to parse and ingest programmatically.

What makes XML so unergonomic to ingest is 1) attributes, which don't map cleanly to a basic data structure that you might find in a programming language, and 2) namespaces, which are extremely, extremely tedious to program against.

Programmers are going to use the format that's the easiest to ingest and manipulate. JSON wins in that regard, hands down. Every time I need to write logic to ingest a namespaced XML document I heave a deep sigh and brace myself for another long week of fighting with LXML. But with JSON it's as easy as `json_decode($str)` and move on with your life.

Devasta · on April 6, 2023

Namespaces, schemas, custom elements, client side templating, XML has so much stuff that the web threw away, so now its forced to reinvent worse versions of it every few years, a shame.

Abandoning XML was the webs biggest mistake.

giantrobot · on April 6, 2023

Whenever XML gets discussed here it's interesting to see what people complain about. In my completely unscientific assessment most things people hate(d) were the overwrought "Enterprise" uses/systems.

Very unfortunately for everyone XML came up at the same time as peak "Enterprise" moat building. No design pattern went unused everything was built with mind numbing "configuration". XML got used heavily in that space because it allowed massive "Enterprise Objects" (local branding varies) to be serialized in a way another system might have a chance to read.

Meanwhile the features you mention got thrown out with the bath water because everyone hated Enterprise style architectures. While I don't love, for instance, everything about XSLT it's built directly into browsers as native code. How many person hours, megabytes of JavaScript, and wasted CPU cycles have been spent reinventing client side templating using JSON? XSLT is already right there and will happily convert serialized data to your presentation format. You also get the ability to have comments in the data and a built in schema validation.

On my current project I'd much weather be emitting and consuming XML rather than JSON. But alas everyone hated Enterprise XML so we're stuck with JSON and the inability of some parsers to handle trailing commas and ambiguous definitions of numerics and not a comment to be found.

Zamicol · on April 6, 2023

XML is oversized for the majority use case.

It's easier to extend a simple standard than to amputate a behemoth with unneeded appendages.

recursive · on April 6, 2023

The problem with extending a standard is that there are so many ways to do it.

irrational · on April 6, 2023

> And after years of being too early at everything, the world had caught up to Doug.

Have we though? Earlier, the article even has Douglas saying:

> It turns out it, well, it’s a multi paradigm language, but the important paradigm that it had was functional. We still haven’t, as an industry, caught up to functional programming yet. We’re slowly approaching it, but there is a lot of value there that we haven’t picked up yet.

I do love the very ending:

Adam: What do you think is the XML of today?

Douglas: I don’t know. It’s probably the JavaScript frameworks.

They have gotten so big and so weird. People seem to love them. I don’t understand why.

For a long time I was a big advocate of using some kind of JavaScript library, because the browsers were so unreliable, and the web interfaces were so incompetent, and make someone else do that work for you. But since then, the browsers have actually gotten pretty good. The web standards thing have finally worked, and the web API is stable pretty much. Some of it’s still pretty stupid, but it works and it’s reliable.

And so, when I’m writing interactive stuff in browsers now, I’m just using plain old JavaScript. I’m not using any kind of library, and it’s working for me.

And I think it could work for everybody.

------

Earlier in the interview where they were talking about how people behind XML and SOAP wanted complexity and were upset by the simplicity of JSON, I was thinking that this was resonating with me and how I feel about how complex web development has become with babel/webpack, transpiling, react/vue, etc. It feels like complexity for complexities sake.

maple3142 · on April 6, 2023

One of the reason to prefer JSON over XML is that you can reasonably parse an untrusted JSON using default configuration without getting yourself pwned. A lot of XML processing libraries still support external entities by default that you have to disable them manually: https://cheatsheetseries.owasp.org/cheatsheets/XML_External_...

Sohcahtoa82 · on April 6, 2023

> you can reasonably parse an untrusted JSON using default configuration without getting yourself pwned.

If only this were true.

https://medium.com/r3d-buck3t/insecure-deserialization-with-...

maple3142 · on April 7, 2023

I know that one, but I think JSON.NET is to blame for this because it decide to take `$type` and other fields and apply some reflection magic on it. It isn't really different from evaling a random json field in your own business code. A lot of sane json implementation also don't do this too, like `JSON.parse` `json.loads` `json.Unmarshal`...

On the other way, XML External Entity is a part of XML standard, so any standard compliant XML implementation have to support it. This is why XXE attack applies to many languages.

user3939382 · on April 6, 2023

If it's not obvious, the issue is that standardizing a data format is going to have trade offs. Interoperability, leveraging tooling universally so all effort is going in the same direction, awesome. The problem is that some uses cases for the format are going to be insanely complex, which will make the standard and tools unnecessarily complex for the simple cases.

JSON is simpler and easier for many cases, but then you lose the interoperability. Go try to make an app right now dealing with Federal government systems or finance, you're going to end up translating JSON<->XML which isn't fun.

There's not going to be a silver bullet solution to this problem, it's not completely solvable.

Sohcahtoa82 · on April 6, 2023

> you're going to end up translating JSON<->XML which isn't fun.

Not fun? It's not even possible in the general sense.

If you have XML that looks like:

    <meal type="breakfast">
       <eggs count="3">
           <topping>cheese</topping>
       </eggs>
    </meal>

How would you convert that to JSON without knowing how the JSON consuming application expects it to be formatted? Where do you put the "breakfast" and "count" attributes?

You'd need to manually write a translator for each potential translation.

user3939382 · on April 6, 2023

> You'd need to manually write a translator

Yep, therein lies the “not fun”. You write a bunch of super complex, brittle code.

Unfortunately because XML is entrenched in certain domains, you have to decide between writing these converters or doing everything in XML which also sucks, especially if you’re trying to write a modern app with a modern stack.

Kuyawa · on April 6, 2023

I remember one time designing the simplest and most readable data format ever and came up with Dixy [0] after removing all I could and still make it usable

I'm leaving it here because it will never be used for anything but at least it may inspire somebody design a better format with simplicity in mind

[0] https://github.com/kuyawa/Dixy

nayuki · on April 6, 2023

This looks a lot like YAML, especially with the non-quoted strings, colons, and indentation. It also seems to share the problems of YAML, namely a very non-uniform syntax. For example, how do you distinguish null (denoted as "?") from a literal string containing one question mark? How do you distinguish the number 1 from the string "1"? Hence why I'm not a fan of both YAML and Dixy.

Other problems to ponder: Is 0 different from 00? Is "1, 2, 3, 4" different from "1,2,3,4"? Is "a: b" different from "a : b" and "a:b"?

duffyjp · on April 6, 2023

I like this! It's like YAML but you can learn the entire spec in 15 seconds.

ptsneves · on April 6, 2023

Why will it never be used for any thing? I like it. Thank you for sharing.

bullen · on April 6, 2023

"So Netscape thought they could do a similar thing for their navigator browser that, if they could get people programming in the same way that they did on HyperCard, on the browser, but now they can have photographs and color and maybe sound effects, it could be a lot more interesting, and you can’t do that in Java."

It's like the man never tried. Try a Java enabled browser: https://www.wikihow.com/Enable-Java-in-Firefox

Just as a reminder Minecraft (the most sold game in history) started out as an Applet.

Applets where not horrible because of the underlying technology, they where horrible because people made bad things with it, just like J2EE was a bad thing people made with J2SE.

But sometimes, rarely, people would make beautiful things with J2SE and J2ME and those are now removed from history forever under the banner of security like everything else that is good in life.

billyhoffman · on April 6, 2023

I've met Douglas a few times at JS Conferences, and he is an excellent engineer (read up on his work on the NES version on Maniac Mansion). However this passage about starting a company and trying to raise capital from VCs demonstrates that even excellent software engineers can be surprisingly myopic, dismissive, and naive about software businesses.

> Douglas: For me, the most difficult thing was raising money. You’re constantly going to Sandhill and calling on people who don’t understand what you’re doing, and are looking to take advantage of you if you can, and they’re going to do that, but you have to go on your knees anyway.

> I found that stuff to be really hard, although some of them I really liked. And sometimes I’d be sitting in those meetings and I’d be thinking, “I wish I was rich enough to sit on the other side of the table, because what they’re doing right now looks like a lot more fun than what I’m doing right now.” And it was even more difficult raising money then, because at this point, the.com bubble had popped and all VCs had been hurt really badly by that. So they were only funding sure things at that time, in late 2001, early 2002.

> And I thought we were a fairly sure thing, because we had already implemented our technology. And by this point, Chip and I understood the problem really well. And we had a new server and JavaScript libraries done in just a few months. And we had demonstrations. We could show the actual stuff. So it wasn’t like we were raising money so that we could do a thing. We had already done the thing, we needed the money so that we could roll it out. And that wasn’t enough for them. They wanted to see that we were already successfully selling it. And I was like, “If we could do that, we wouldn’t need you.”

Only they hadn't. They had built a demo of what we would later call a web 2.0 app. It wasn't even an application that solved a business problem or did anything specific. It was just showing the concept. That's not a product and that's not a business. The VC's point was: Show us proof that this idea has tangible benefits people will pay for.

The biggest misconception of VC's is that you raise money to "successfully sell" something you've built. You don't. You raise VC money to scale something that has value. So you need to communicate the business value, and ideally have proof-points (either in the form of sales, or data) that prove the value.

Of course Douglas found raising money difficult. But he doesn't seem to have the self awareness that this was probably due to him, and not the rich suits on the other side of the table.

dvh · on April 6, 2023

For me 3 killer features of JSON are:

1. Parsing JSON doesn't require adding new firewall rules

2. There are no comments, so nobody will try to invent their own meta format or annotations in comments and instead they will put data in the JSON as they should

3. (When compared to JS) someone finally had the balls and picked one type of quotes, this makes making parser so much simpler.

IshKebab · on April 6, 2023

Not supporting comments in JSON was a huge mistake. Yes I'm sure that someone, somewhere has once added comment directives to a file that caused issues. But that's such a rare problem compared to the very real and damaging and annoying problem of not being able to add comments to config files (hello package.json) that it's definitely the wrong choice.

XML supports comments and I have not seen a single use of comment directives in it ever.

I have seen plenty of comment directives in programming languages, HDLs and so on. But they are usually used as hints, e.g. to linters or to control compiler warnings, and they work perfectly well and cause no problems at all in my experience.

You might say that Crockford didn't anticipate JSON being used for config files. Fair enough. But now that it is, it should support comments.

My recommendation is to use JSON5 since it has a distinct file extension and fixes some other things about JSON too (e.g. trailing commas, hex constants) without being full on YAML insane.

asdfafe · on April 7, 2023

directives can just be put in adjacent fields anyways

Ygg2 · on April 6, 2023

> There are no comments, so nobody will try to invent their own meta format or annotations in comments and instead they will put data in the JSON as they should

It also means it's worse format for configs where you sometimes need to annotate a few nodes with comments.

lttlrck · on April 6, 2023

Yep.

"comment": gets littered across the JSON... or temporary changes are copied and the original property name is invalidate with a prefix. The simple structure is gone, replaced with adhoc workarounds.

Similarly when you want to use a type not supported by JSON such as datetime or binary data, you might end up with "type":"binary" and use base64 or whatever in the value (shoehorning attribs) - when it really needs a schema to follow during parse and stringify. Or OpenAPI, which is hardly lightweight and really doesn't match the simplicity of JSON.

hiccuphippo · on April 6, 2023

For configuration files I add an extra key "_" and a string with the comment as the value. I even add multiple "_" keys to the same object and never seen something break.

IshKebab · on April 6, 2023

This is just a terrible comment format isn't it?

oblio · on April 6, 2023

JSON could really use schemas as part of the main implementation.

Local schemas, not crazy remote schemas.

Or some sort of way to bless an "official" schema format.

slaymaker1907 · on April 6, 2023

My biggest gripe with XML is that it can't represent arbitrary strings easily. Even in the latest versions of XML, you can't easily serialize strings with embedded nulls since it is forbidden by the spec to even use something like "". XML 1.0 was even worse since it doesn't allow any characters which require surrogate pairs under UTF-16. Instead, the spec writers apparently expect devs to come up with their own escaping scheme in which case why bother having a standard at all?

Even C# just punts on this issue and won't emit valid XML if a string you serialize happens to have a null character in it.

Sohcahtoa82 · on April 6, 2023

If I had to deal with strings that XML won't allow, I'd probably just rely on encoding the data in Base64 before throwing it into the XML.

A human won't be able to read it (Unless you're crazy and have learned to read Base64), but the application still can easily. You'll just have to add a Base64 translation step before/after serialization/deserialization.

slaymaker1907 · on April 7, 2023

It's very annoying to do that though since that introduces a bunch of logic in the application and also removes the benefit of being able to read the strings in the XML as a human.

hot_gril · on April 6, 2023

I worked on a customized ejabberd at a company for years, drinking all the XMPP kool-aid and becoming very familiar with XML along the way. Slowly we all began to realize how bad XML was. We eventually put our custom extensions' data into JSON just embedded inside the XML. Says a lot that such a hack was actually an improvement.

The other two premier XML use cases I can think of are

1. RSS: Last time I did this, ironically I built the payload with a JSON-API'd lib that deals with the XML drama for me. Worked fine.

2. Configs. Rarely are these done in XML anymore. Human readability matters for configs. But there are also better options than JSON for this.

hot_gril · on April 6, 2023

3. HTML-like things where XML actually makes sense cause you're defining some sort of document with reusable objects that gets rendered at the end.

codr7 · on April 6, 2023

I bought one of the first books about XML, read it cover to cover; started writing my own parsers and generators, designed a custom XML protocol for a network server at work.

Then I had to live through the whole SOAP-drama, and Java EE; and ended up promising myself to never touch it again.

It has too many degrees of freedom for its own good, the C++ of data formats.

JSON is in many ways the other end of the spectrum; simple but underspecified and painful to deal with in anything but JS.

I often dream of something in-between.

dralley · on April 7, 2023

https://github.com/ron-rs/ron

HideousKojima · on April 6, 2023

What if I hate both formats? XML is overly verbose, while JSON isn't specific enough or precise enough for a lot of my needs.

- This message brought to you by TOML gang

filoeleven · on April 6, 2023

I just checked out the spec, and it gets pretty ugly in the Table section. A lot of the json examples are both shorter and IMO more precise. Stuff that’s not allowed with [table] is allowed with [[table]], and it’s confusing to understand what level of depth I’m at.

I’ll take edn over any of “em. https://github.com/edn-format/edn

Comments and time stamps allowed, arbitrary nesting of data structures, make your own tagged literals if you need them. And commas are whitespace, mostly unnecessary.

62951413 · on April 6, 2023

I've got yet another markup language for your hate group's target list :)

Come join the dark side where we enjoy the wonders of binary formats such as avro and protobuf.

HideousKojima · on April 6, 2023

I actually love binary formats, especially for network communication. We probably waste tons of processing power and network bandwidth needlessly sending JSON back and forth everywhere and re-deserializing it. I'm personally a fan of MessagePack.

Though for something where you want human readability it's hard to beat TOML in my opinion.