Hacker News new | past | comments | ask | show | jobs | submit login
JSON vs. XML (corecursive.com)
152 points by geffchang on April 6, 2023 | hide | past | favorite | 245 comments



This quote is funny:

    Douglas: The first time I saw JavaScript when it was first announced in 1995, I thought it was the stupidest thing I’d ever seen. And partly why I thought that was because they were lying about what it was.
A bigger more interesting thing though is how his company failed, in part, because they used hand-rolled JSON for messaging.

    Douglas: And some of our customers were confused and said, “Well, where’s the enormous tool stack that you need in order to manage all of that?” 

    “There isn’t one, because it’s not necessary”, and they just could not understand that. They assumed there wasn’t one because we hadn’t gotten around to writing it. They couldn’t accept that it wasn’t necessary.

    Adam: It’s like you had an electric car and they were like, “Well, where do we put the gas in?”

    Douglas: It was very much like that, very much like that. There were some people who said, “Oh, we just committed to XML, sorry, we can’t do anything that isn’t XML.”
I started my career during peak XML crazy and while I liked parts of it at the time, the number of things it was used for was quite insane. I had to maintain a system once where a major part of it was XSLT, when could have just been a simple imperative algo with some config settings.

Anyhow, hope you like the episode!


> I had to maintain a system once where a major part of it was XSLT

Every time the topic comes up I feel the need to say that I loved XSLT. It was so nice. XML frankly was kind of simple, too. It had elements and attributes and that was it. And it had xpath, which offered, among other things, a parent axis, so you could walk the node tree upwards.

In JSON you can't get to the parent from the child. And walking down a tree is unintuitive, because nodes can be of different types, and if you want to maintain the order, or use successive instances of the same things (that would have the same name) you need to use arrays, and arrays of arrays of arrays look bad. Schemas are an afterthought.

JavaScript is cool -- it has mostly eaten the world anyway. But JSON is not so good IMHO.


I would describe it something like: XML is great as a document format, but shitty as an RPC format. JSON is vice-versa. Web developers spend a lot of time with JSON as an RPC format, so they tend to put it on a pedestal. But try keeping your recipe collection structured in JSON text files and the pain will start immediately. YAML is even worse.

XSLT was (and still is) great for transforming documents. Want that recipe collection as HTML? Easy.


Yep:

- If you are describing hierarchal data, JSON is great

- If you are describing text with markup, especially extensible markup, for machine generation and consumption, XML is great.

- If you are describing a graph, neither have broadly accepted standards so you are kinda on your own.

Depending on your requirements, a recipe collection might be better in XML or in a flavor of markdown. A comprehensive data schema and software support for recipes could be challenging/limiting, compared to marked-up text.


Markdown (like HTML) offers formatting structure, not semantic structure. Maybe you want to query for recipes that can be made in under an hour, or that contain orange as an ingredient (as opposed to merely a serving suggestion in an orange bowl). A proper XML (or even JSON or YAML) structure would enable this, Markdown does not.

You can pretty easily translate XML to Markdown using XSLT, though.

I don't think hierarchal structure is the differentiator; recipes and web pages are hierarchical and they'd still be hell in JSON. XML handles hierarchy just fine. I think the differentiator is whether your content is a document, that is, composed significantly of multiline text. Multiline text in a JSON file tends to be human-hostile, but we're all comfortable editing eg html.


> If you are describing a graph, neither have broadly accepted standards so you are kinda on your own

Relational databases can describe most graphs, but they rarely ever have a great text format.


And CSV for tabular data!


If only it was more standardized :-(


I first touched XSLT in 2010. I appreciated what it could do, but it was painful to work with due to poor documentation and tooling. This has only gotten worse by comparison with alternatives.

You can still do XSLT in the browser. You can serve arbitrary XML and transform it. As an example, Atom feeds on my website (such as <https://chrismorgan.info/blog/tags/meta/feed.xml>) render just fine in all mainstream browsers, thanks to this processing instruction at the start of the file:

  <?xml-stylesheet type="text/xsl" href="/atom.xsl"?>
But working with it is not particularly fun, because XML support in browsers has been only minimally maintained for the last twenty or so years. Error handling is atrocious (e.g. largely not giving you any stack trace or equivalent, or emitting errors only to stdout), documentation is lousy, some features you’d have expected from what the specs say are simply unsupported (and not consistently across engines), and there are behavioural bugs all over the place, e.g. in Firefox loading any of my feeds that also fetch resources from other origins will occasionally just hang, and you’ll have to reload the page to get it to render; and if you reload the page, you’ll have to close and reopen the dev tools for them to continue working.


I think out of the box, browsers can only do xslt 1; but Saxon offers a JS version of their engine that does xslt 3.0 and is free (as in beer): https://www.saxonica.com/saxon-js/index.xml


It’s actually worse than XSLT 1.0 due to inconsistencies and incompletenesses. For example, Firefox doesn’t respect <xsl:output method="html">, but uses an XML parser on the transformed result regardless; and doesn’t support disable-output-escaping. I wanted these for my Atom stylesheet (for <content type="html"> and the likes; instead I had to emit serialised HTML and decode it in JavaScript, though with difficulty I could have done feature-detection to skip that step if disable-output-escaping worked).

Even perfunctory probing shows fairly serious problems in Firefox (where Chromium is consistently much better, in this specific area). I could file quite a few bugs in short order (e.g. these mentioned, bad document.contentType values, <template> not working properly), but I don’t think there’s any interest in fixing things.

(I wrote this comment as much for my own future reference as anything else. XML/HTML polyglot stuff makes things decidedly messy at times.)


> In JSON you can't get to the parent from the child. And walking down a tree is unintuitive, because nodes can be of different types

JSON only competes with XML. XSLT, XPath, and XSD are just as much an afterthought in that they are completely separate from XML and are entirely optional. The engines written around those is where the powers to walk the tree and validate come from, not XML itself. There's a wide range of tools to get the same benefits for JSON sources, and they usually handle XML and other data sources too, because it shouldn't matter. The reason the X* tools have fallen out of favor is because they're unnecessarily tied to a single type of source data.


JavaScript is as good as JSON. It has eaten The world jest because it was in every browser. Similarly Chrome, advertised on the biggest search engine.


> I started my career during peak XML crazy and while I liked parts of it at the time, the number of things it was used for was quite insane. I had to maintain a system once where a major part of it was XSLT, when could have just been a simple imperative algo with some config settings.

Same here. XML was going to save the world! Remember XML data islands with data embedded in page source and displayed via XSLT?

The craziest thing I had to build was a tool to manage the dozens to hundreds of XML configuration files that powered our product. The tool allowed editing and deploying the files, complete with validation and even input suggestion based on associated XSD for each XML file.


I remember the XML is everywhere phase. The community that hasn't retired or passed on has largely come off of that. You can return JSON natively from XSLT 3.0 now. I've been on both sides of the love/hate fence with XML, but these days when I have the need to work with it I leave the projects really satisfied.


The best part of the XML craze was the deprecation of proprietary binary file formats. I had to work with some software that was truly awful but since it saved it's data as XML,I could skip the software entirely and work directly with the data.


Nowadays you have the same type of tools, only for YAML. Not sure if that's so much better.


Which tools are you thinking of?


The Github actions editor for example.


YAML can die in a fire


I do just like this with json. Creating api blueprints.


I really respect that you provide transcripts. It's terribly important for accessibility and for getting what people have said into the various search engines.

I was sad to hear that Crockford is not aiming to be the author of "the next language" anymore, but I wonder how sincere that really is. His thoughts on actor-based languages are interesting.


Thanks!

Crockford's thoughts on actors are really interesting. I tried to pull them apart but I didn't get very far and ended up not including them in the episode.

What he is envisioning is not exactly like Erlang but not exactly like Scheme. He said that Carl Hewitt had a lot of ideas and they were hard to unpack.

If you're interested though, I would reach out to him. He is very approachable and excited to talk to people with ideas for new ways of making things simple.


I do wish some more details about his current thoughts on actors had been included, though I can understand if they were hard to tease out. I would guess that may go back to his work on Electric Communities as E was a message passing system. Would be interested in hearing more of where his mind is on actors today.

The closest thing we have right now I think is Spritely Goblins, though that is Scheme. (Not coincidentally, one of the other Electric Communities co-founders is also a Spritely Institute co-founder: https://spritely.institute/about/)


The design of the language he is (was?) working on is here:

http://www.crockford.com/misty/

There are a few talks from a few months ago on Youtube that detail some of the rationale behind it.


Oh cool, thanks for the pointers!


Just wanted to say your podcasts are really good. Many podcasters aim for quantity over quality; with you it’s the other way around. Thank you!


I remember a meeting where a consultant from an MCP excitedly told our mutual client that the XP in the upcoming version of Windows stood for 'XML Protocol.'

More innocent times.


I had a power strip which had "works with windows 95" on the packaging box.


This is just an example of how marketing knows nothing about the products they are marketing, or are just flat out snake oil sales. Like packaging for bacon exclaiming "gluten free"


>This is just an example of how marketing knows nothing about the products they are marketing

Or it is a question that people who are not familiar with technology will ask frequently and makes no sense to people who are.


I use bing, and tell people to 'bing it' sometimes (instead of 'google it').

I had a family member ask "will bing work with my yahoo?"

So... I know those words individually, but... man... together, in that order... I don't know how to respond. I think I said something like "don't worry about it - it's not that big a deal" and left it at that.


I mean... but it is? They're not wrong. Either the marketing works or it doesn't, but it's not false. If it works, then it's good marketing. If it doesn't, then the marketers don't know their market.


Ah yes, "this cereal is asbestos-free!" kind of marketing. It's on the same page with non-GMO salt.


Exactly. It doesn't really matter that the claim is technically true, it just has no bearing whatsoever on the product itself.


Wow, and I thought my headphones from 1998 that said "MP3 Ready" on the package were stupid.


Did it work with 98?


I wasn't working at the same place in 1998, so I can't verify.

The hype around Windows 95 at the time was... incredible. Midnight store openings, hours-long line waits... insanity.


> The hype around Windows 95 at the time was... incredible.

I left one job for another in 1994 simply because the new company had access to the windows Chicago beta program (and it was a 20% salary bump). But the main reason for me choosing g that org was the beta, because I had other offers with similar salary bumps at the same time.


Scala had XML literals as part of the language!

Apparently Philip Wadler was the person who told them needed it, because the future was XML.

( Walder is big Haskell/PL person)


That's surprising! Wasn't it Philip Wadler who said "The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well."

http://harmful.cat-v.org/software/xml/


The irony, to me, is how it is viewed as a bit of a mistake that they had XML in Scala. Hard to square that with how instrumental JSX was in getting some of the modern JavaScript frameworks as far as it has.


If the web spec had a JSON analog for HTML, I think you'd see a lot less love for JSX.


Maybe? Though, I suspect JSX just got lucky with when it got popular. There were plenty of mixed HTML/Script things in the past. Most of them met a lot of resistance.


ActionScript 3 as well as the E4X support in Mozilla's Browser+JS/XUL engine for a while (removed now afaik).

Around that time it was pretty nice passing around XML, as I was forced to work with VB.Net which also had an XML literal syntax on the backend and Flash/AS3 on the UI.

I had built a POC with E4X that was VERY similar to React/Redux over a decade before React, but the other browser vendors didn't have it... At the time IE and Chrome were shifting towards JSON.


VB.NET still has them. I remember doing a project with an XML database (ugh) back in college when the version with the literals was released. I was ecstatic.


Was it far from JSX?


It was almost literally JSX (except with Scala syntax instead of Javascript, of course).

An XML literal in Scala would have been:

  // XML literals (to be dropped)
  val mails1 = for (from, to, heading, body) <- todoList yield
    <message>
      <from>{from}</from><to>{to}</to>
      <heading>{heading}</heading><body>{body}</body>
    </message>
  println(mails1)
This was replaced with "XML string interpolation", which

  // XML string interpolation
  val mails2 = for (from, to, heading, body) <- todoList yield xml"""
    <message>
      <from>${from}</from><to>${to}</to>
      <heading>${heading}</heading><body>${body}</body>
    </message>"""
  println(mails2)


What are some examples of the "enormous tool stack" required for XML? I ask, because I came into software development after everyone adopted JSON. When I do need to parse XML, there was a library I could use, although I will admit that needing xpath was a bit annoying.


If your XML is written the way people write JSON, then the stack isn't enormous. But XML is usually wrapped in layers of additional complexity. SOAP envelopes and namespaces they require, XSLT that someone invariably used to write an XML transformer, etc.


Also broken parsers. So many broken parsers. Honestly this is what keeps JSON going: the parser was simple because the language was simple, and as a result any JSON you got basically worked the same way.

Not so with XML: all the parsers were insanely complex with the namespacing and whatnot feature support and possible external URLs and everything else...and as a result however no XML library was ever adequate to interface with anything. On multiple occasions generally the best way to build XML for something was to take a working copy, and then glue text together so you would exactly replicate whatever that specific application wanted, rather then trying to use anyone's library for it.


This was before my time, but I believe the WS-* series of specifications is an example.

> Like with the original J2EE spec, which sought to complicate the basic mechanics of connecting databases via HTML to the internet, this new avalanche of specifications under the WS-* umbrella sought to complicate the basic mechanics of making applications talk to each other over the internet. With such riveting names as WS-SecurityPolicy, WS-Trust, WS-Federation, WS-SecureConversation, and on and on ad nauseam, this monstrosity of complexity mushroomed into a cloud of impenetrable specifications in no time. All seemingly written by and for the same holders of those advanced degrees in enterprisey gibberish.

https://world.hey.com/dhh/they-re-rebuilding-the-death-star-...


> When I do need to parse XML, there was a library I could use, although I will admit that needing xpath was a bit annoying.

It sounds a bit like someone paved a garden path for you by that point. One of the reasons for the "enormous tool stack" wasn't just depth of tools needed ("tool X feeds tool Y which needs tool Z to process namespace A, but tool B to process namespace C, …"), but also the breadth. I recall there were at least six types of parsers to choose from with all sorts of trade-offs in memory utilization, speed, programming API: a complicated spectrum from forward-only parsers that read a node at a time very quickly but had the memory of a goldfish through to HTML DOM-like parsers that would slowly read an entire XML document all at once and take up a huge amount of memory for their XML DOM but you could query through the DOM beautifully and succinctly. (ETA: Plus or minus if you needed XSD validation at parsing time, and if you wanted the type hints from XSD to build type-safe DOMs, etc.)

A lot of XML history was standards proliferation in the xkcd 927 way: https://xkcd.com/927/

XPath tried to unify a lot of mini-DSLs defined for different DOM-style XML parsers.

XSLT tried to unify a bunch of XML transformation/ETL DSLs.

The things XPath and XSLT were designed to replace lingered for a while after those standards were accepted.

Eventually quite a few garden paths were paved from best practices and accepted "best recommended" standards and greenfield projects start to look easy and a simple number of well-coordinated tools. But do enough legacy Enterprise work and you can find all sorts of wild, brownfield gardens full of multiple competing XML parsers using all sorts of slightly different navigation and transformation tools.


The last time I worked with XML, using an external library wasn't really a great option. I ended up writing my own parser in C++. It took about a week to get all the features required for my purpose.


Honestly, if you just need a one-off transformer, VB.Net is probably one of the better options. The .Net XML library is pretty good in the box, and VB.Net has XML literal support on the top... if you just need to read, then C# is a better language imo.


VB.NET is masochism, and XML literals are a dumb, confusing, and unnecessary feature. Creating XML elements with regular code is quite easy, and at scale, you probably want templates.


XML might not have been so bad, if there weren't dopes pushing SOAP.


I have huge respect for Doug Crockford, and I never imagined I would disagree with him.

However I think by now we've seen that a lot of that "unnecessary" XML complexity was not, in fact, entirely unnecessary. These days we use JSON for everything, but now we've got JSON Schema, Swagger/OpenAPI, Zod, etc etc. It's not really simpler and there's a lot of manual work - we might as well be using XML, XSD & SOAP/WSDL.


XML is pretty awesome. As a 12 year old in 1998 I remember being enthralled with it and my young mind imagined a lot of possibilities. But alas, I had no real use for it until xhtml, but even then...As the years passed I learned to drive and get the heck off my computer for a few years.

It wasn't until about a decade later when I finally got to use XML "for real". At my academic publishing job. One of my first real projects was having a set of academics analyze documents in a web application I built. Prior to that they were analyzing them by hand, were converted to SGML somewhere in Korea, and we would use omnimark to move them to XML and eventually a library application.

The XML community, the one's who haven't retired or passed on, have been more welcoming of the competition too. They went from XML is everywhere, to being able to return JSON from an XSLT. I am in a small shop, and so I wear many hats. But I am always satisfied when I get to work with XML, or craft an xsl/xq script that does exactly what I need. Additionally, the community as a whole is very helpful, and a bit more grey. Meaning, they are less likely to fall for trends and bullshit.

A bit disjointed, but ,in short, XML is awesome. Now only if they would move Balisage back to Montreal. I'm no fan of DC or virtual conferences.


With XML, the complexity is the baseline, and it only goes up from there. With JSON, the complexity is just an option, the baseline is pretty simple. Also, good XML-tools are rare or expensive.


Baseline for XML would be a document that doesn't use schemas, namespaces, attributes, or any of the SGML legacy stuff like DTDs and PCDATA.

Such a document is essentially as simple as the equivalent JSON.


Even that is more complicated than JSON.


Care to elaborate?


Every key is written twice, for opening and closing. Keys can be duplicated, and in fact that's what you have to do if you want a simple list. There aren't numeric types, so you have to parse strings. It also looks horrible.

  <cds>
    <cd><title>Led Zeppelin II</title><artist>Led Zeppelin</artist><price>999</price></cd>
    <cd><title>La Brise<title><artist>Arax</artist><price>999</price></cd>
  </cds>
or

  <cds>
    <cd>
      <title>Led Zeppelin II</title>
      <artist>Led Zeppelin</artist>
      <price>999</price>
    </cd>
    <cd>
      <title>La Brise<title>
      <artist>Arax</artist>
      <price>999</price>
    </cd>
  </cds>
vs something like

  [
    {"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
    {"title": "La Brise", "artist": "Arax", "price": 999},
  ]
You can probably do better using XML attributes. But then you're using more features.


If we are complaining about the closing tags, might as well add that embedding newlines or quotes into JSON is less than pleasant.

Which is to say, this feels a touch of a non-issue. Yes, writing it by hand can get tedious, but that is true of any and every format. Is why you will almost certainly reach for other formats if doing a long list of data. And each and every one of them will fail for some form of input in ways that is frustrating.


Writing that JSON example by hand wasn't tedious. The XML example was, and the result is unreadable. It's important to be able to debug things easily. I'm going to manually type JSON when I'm testing an API, and I'm going to read the response.

If you absolutely don't care about human interface, no reason to use XML either. It's meant to be more verbose. The XML tags will often dominate the size of the payload with things like `<question>Who</question>`, so you have to start thinking about shorter names. Yes JSON has a similar problem, but at least it's halved and you don't have to instruct everyone to call each list element "e". If you super care about size, you'll use protobufs or something.


<question>Who</question> "question":"Who",

To me, this does not seem like a win that's worth much, especially since it's likely to shrink considerably even with naive fast compression.

Furthermore, as messages grow in size, the explicitly named closing tag actually kind of starts helping.

Both of these syntaxes have their annoying quirks, for sure, and I understand you really dislike the closing tag; that clearly doesn't bother many people.

But regardless of personal preference, I'm really skeptical any of this really explains json's relentless path to replace (most) xml. Other reasons, such as the extreme wordiness some xml apis chose, the poor implementation of namespaces, the problems with embedding arbitrary data (in particularly control characters), the inconsistency between attributes and elements, the lack of support for numbers, the lack of (conventional) support for key-value pairs - all of these surely played a much greater role than a fairly limited syntax issue.

And it's not even like json is without impractical quirks; lack of comments, the ban on trailing commas, and the need for quotes in object-keys spring to mind. Yet those don't mean json is likely to die out soon - even though even javascript itself from which it is derived doesn't suffer from those (anymore)!


Not wrong, but also probably not really indicative of problems or actual use. And while I will be manually typing some data to go into an api for testing, I'm far more likely to by typing it in a thing that is was looser in what it accepts than a json document. Literally today just using dicts in python. And even then, my debugging is dominated by mistakes in data entry there.

Also, I see you took it to be a full on defense of XML. I did not really intend it that way. I think both can be fine. And insisting on either is likely a mistake.

I do find your nitpicks here amusing, still. Size of tag is just as obnoxious as size of key. And, though it can dominate the textual representation, there are clear ways to reduce that. Even knowing that BSON and Binary XML exist, though, I'd be hard pressed to say any project that failed because they weren't using them.


JSON vs XML isn't going to make or break your project. But why would you use XML for data interchange. It makes sense for things like HTML where you're writing a document, but otherwise, it's usually just a needless burden.

Like, if I were there when XMPP was created, yes I would have insisted on JSON. XML was a plainly bad choice. Edit: Oh, JSON didn't exist until a little later. Maybe something similar did.


I mostly agree. I do think Jupyter choose wrong by picking JSON for their documents. They are literally marked up source documents.

XML does have the "benefit" of being a bit more extensible than JSON. Specifically, being able to have namespaced elements in there does make some sense on paper. For example, you could have two extensions both add in data using the same keys, but different namespace. Can't really do that with JSON.

In practice, I think it just fell flat due to way too much "forethought" in things they anticipated people wanting.


Yes, XML is probably a good fit for something like Jupyter. Basically if you want to reuse a lot of "objects" throughout a structure and have the mean the same thing in different nested parts of it. Like how <a> in HTML means a hyperlink whether it's under <body> or some nested <div>.


I'd phrase it more that there is a document with mixed use items marked up throughout it. Some items in the document are code, in which case you probably want to fence the code with a marker on what language is used. Other items are just prose, in which case you'd like to just write the prose as much as you can.

Some items can even be other forms of xml that have their own schemas dictating what is valid. (Thinking SVG here.)

I'll also note that even there, I can see why HTML went with the odd parsing they do. XHMTL tried going with "well formed" documents, but that falls flat for the authors. Is why "sections" of a document are essentially just collecting all of the "h" tags and making an implied tree out of that. As opposed to making the tree directly. To that end, my markup language of choice for Jupyter style things is org-mode in emacs. Yes, it has some warts; but again, all formats that I have ever seen have warts.

Edit: I want to add that I don't intend this as a "correction." I should say that I agree with your post. Complicated field where I doubt I'd have done better than most others. :)


Yeah, now express this in JSON:

   <div>
     <p>JSON example:</p>
     <pre>
      [
        {"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
        {"title": "La Brise", "artist": "Arax", "price": 999},
      ]
     </pre>
     <p>Source: <a href="https://news.ycombinator.com/item?id=35472014">click here</a>!</p>
   </div>
JSON is great for a certain domains, but there are other domains where it is a nightmare and XML shines.

Use the right tool for the job.


you can't ignore ux stuff like this in a protocol that's meant for general use

something like duplicating info in closing tags in XML (which applies to every element) isn't really comparable to stuff like having to escape certain characters in JSON strings (which applies only to the values use those things)

perfect is the enemy of the good, and the good is the metric


Don't you also have to escape stuff in XML? Like &gt, which is even worse.


Yes, though many languages have lenient parsers. Most browser parsers, for example, will probably only be lenient if parsing "HTML."

    new XMLSerializer().serializeToString(new DOMParser().parseFromString("<a>hello < </a>", "text/html")) 
The above in my console does as expected there. And again, entities are a very dangerous part of XML and friends.

You are correct that if you tell it that that is xml, the browser will throw it back at you. Just as the JSON parser will barf on JSON.parse("{'test':'value'}").


per specifications, json parsing is not lenient, html parsing is lenient


Right, and amusingly, more than a few json parsers are very lenient in this. That or folks abandon ship fairly quickly and go for another spec that is far more friendly.


well json definitely does not accept `{'test':'value'}` as valid input

any parser that behaves otherwise is pretty clearly buggy

json has many problems but parsing ambiguity is not really one of them


Me thinks you have never looked at the field. I'd as soon declare csv is an error free format. Only true if you ignore the proliferation of applications that get it wrong. In subtle ways, often. Still wrong.


csv is wildly ambiguous, to the frustration of ~every data science engineer in industry

json is not

show me an application that parses `{'a':'b'}` as valid JSON, i'm actually interested, probably there are some which exist, but there is no ambiguity about those applications being wrong



fun doc! it lists many of the undefined behaviors of the spec, and many of the problems in common parsers

afaict none of them permit keys or value strings to be expressed with single quotes


Apologies for the, in retrospect, somewhat lazy posting of an article with no comment. I thought that article had a section about how many of them allow single quotes if you don't "enable strict." I am not seeing it on review, though; so either I made that up in my mind, or I'm remembering another article. Either way, apologies.

I did find https://github.com/json5/json5 no a quick search that basically says what I asserted about people just jumping to another standard for things that you hand write. I was probably also thinking heavily about python's dict syntax. (And I confess, I still don't know when to use single versus double quotes in python...)


no worries mate


To be pedantic, html parsing is not lenient, it is unambiguously specified.


if that were true then browsers would refuse to render text/html responses that didn't include a closing </html> tag, i guess


No, because the closing </html> tag can be omitted according to the current HTML spec. See https://html.spec.whatwg.org/#optional-tags


this is exactly my point

html is not precisely defined


Sorry I don't understand your argument. HTML is fully and unambiguously defined, as you can see if you follow the link. Some tags are optional in certain contexts, but this is also precisely defined.


I think you're missing the point that it is defined, the current html5 spec says that <title> implies the existence of <head>, <body> implies the end of <head>, body tags imply the end of <head> and the start of <body> etc.

HTML5 is not XHTML.

<!DOCTYPE html> <title>Title <h1>Heading

expands to

<!DOCTYPE html> <head><title>Title</title></head> <body><h1>Heading</h1></body>


if `<title>A <h1>Heading` is equivalent to `<head><title>A</title></head> <body><h1>Heading</h1></body>` then this means the language is not precisely defined


Thanks. I get your point about the close element including the tag name - but that's the kind of detail I leave to the serialisation library, in the same way that the close scope token in json is different to the start scope token.

As for "looks horrible"... well yeah, I always feel that xml looks "spikey" somehow. But I've been programming in curly-brace languages for 30+ years and I still find json harder to read than xml: I think my brain tries to interpret it as code, not data. I find xml easier to read (even when its unformatted) precisely because the close-tokens kind of document what element they're closing.

Each to their own I guess. At least we're not stuck using ASN1.


> At least we're not stuck using ASN1.

Prepare for trouble, and make it double: http://xml.coverpages.org/dstc-xer2.html


And if someone is nice enough to stuff a NUL in the document, it all shatters.


Also this may just be the time in which I got into programming showing, but it seems like JSON encoding/decoding has been built into more languages than support for XML ever was. That's one less required dependency and thing to have to think about in many cases, like in Swift projects all I have to do is make sure my model structs/classes conform to Codable and I'm ready to hit endpoints.


That’s because writing a JSON parser is pretty straight forward with just a couple edge cases.

Writing a conformant XML parser is a HUGE undertaking comparison.

I could get most places to give me the time to write a JSON parser in whatever language of it didn’t have one. I couldn’t do that with XML.

Because of this, every common language (and most uncommon ones) has a JSON parser while XML parsers are less common (and fully conformant ones are even more rare).


Here to say this too. Compositional complexity is an advantage.

As a human in a repl, I appreciate the balance of readibility between XML which uses a larger set of syntactical characters, and YAML which uses fewer.

I also appreciate JSON's ontological simplicity over XML. This primarily boils down to the lack of attribute nodes and explicit difference between objects (lists of key-values) and arrays (lists of values).


> With XML, the complexity is the baseline, and it only goes up from there. With JSON, the complexity is just an option, the baseline is pretty simple.

Very well put. And we could lower the baseline substantially towards simplicity, even from JSON.

It's pretty clear that a lot of people think this way. Some even seriously try to figure out what such a baseline of simplicity would look like.

There are lots of simple indentation-based designs (similar to YAML) such as NestedText[0], Tree Notation[1], StrictYAML[2], or even @Kuyawa's Dixy[3] linked in this thread.

There seem to be less new ideas based around nested brackets, the way S-expressions are. Over the years, I have developed a few in this space, most notably Jevko[4]. If there ever will be another lowering of the simplicity baseline, I believe something like Jevko is the most sensible next step.

[0] https://nestedtext.org/en/stable/ [1] https://treenotation.org/ [2] https://hitchdev.com/strictyaml/ [3] https://news.ycombinator.com/item?id=35469643 [4] https://jevko.org/


I guess, it depends on how you define XML baseline. You can have a very simple XML with only bare tags. It will work just fine. Arguably, it's even simpler than JSON that way. A basic parser for that it probably not more complex than a JSON parser.

All the optional complexity that can go on top, though, is probably better specified for XML. Transformation is well defined for XML (XSLT) but not at all for JSON (I guess, you write your own code to manipulate native objects).

Schemas are basically a native feature for XML. Not so much for JSON.

All sorts of specialised vocabularies are defined for XML. A few are defined for JSON, too.


For a lot of XML you need to be able to support XML namespacing, and doing that adds a lot of complexity over the original pure XML.

At first XML namespacing sounds simple. Each tag and attribute will have an optional uri attached to it, no big deal right?

From reading through the specification one could be forgiven from assuming that the prefixes are just arbitrary mappings that a processor can ignore, or automatically remap to alternate prefixes.

For example, it is true that <abc:a xmlns:abc="https://example.com/xyz" xmlns:def="https://example.com/xyz"><def:b>5</def:b></abc:a> (notice both namespaces are the same url) is equivalent to: <a xmlns="https://example.com/xyz"><b>5</b></a>.

Unfortunately, the data model also allows for content to reference the namespaces by prefix, and therefore every general xml processor that supports namespaces must keep around an application accessible mapping from the prefixes to namespaces, as the application may need to be able to access that information to interpret attributes or content. The only exception to this would be if the general XML processor insisted on having schema information for every namespace it might come across. In that scenario it would be able to tell if an attribute value of "abc:b" is really a string literal, or a reference to a namespace identifier (QNAME data type), where the namespace is whatever the current "abc" prefix is bound to, and the identifier is "b".

But obviously we don't want to add full schema support for a simple implementation, so we need to keep the mapping information around, just in case the application needs it. We also cannot easily offer nice features like changing a document to use preferred prefixes for certain namespaces, unless we also keep any prefixes that are used in values that could be interpreted as QNAMES, just in case they actually are, but our processor does not know, because it has omitted schema support for simplicity (or perhaps it included schema support, but does not have a schema available for some namespace).

And that is just the complexity that stems from one fairly small quirk in how XML works.

You also have no idea if an element content needs to preserve whitespace or not if you don't know the schema, and don't happen to have an xml:whitespace attribute present. Thus if you want to re-indent arbitrary xml for readability safely you could end up with something like this:

    <abc
        ><def
            >5</def
        ></abc
    >


I understand what you're getting at but that is you choosing higher complexity baseline. Yes, it's a part of a standard but you can chose not to support it. No one said you have to support all of XML-verse in order to use it effectively in your particular application. The most common cases are usable without any of it. Look at most RSS/Atom feeds, XHTML, SVG. They all can get by with simple tags and and attributes.

I'm just not buying the argument that XML's complexity is somehow remediated in JSON. JSON becomes as horrible as XML when you bring it up to feature parity. And that's when there's a way to match features. Whatever people say about XSLT, it is powerful, reasonably well defined, and generic over all documents (even though complex). There's nothing like it for JSON I know of.


If we are going for simplicity, surely S-expressions wins? You can support structures similar to JSON or XML on top of it, but the baseline is simpler.


The new KiCad file formats are all S-expression based[0], except for the project files which are JSON IIRC. I think it works pretty well for representing a tree of typed objects textually. They don't even have any LISP connections. Haven't seen S-expressions used anywhere else, though.

[0]: https://dev-docs.kicad.org/en/file-formats/sexpr-intro/


I’d speculate that human minds and memories work much better with associative structures rather than sequenced ones. JSON draws a clean separation between these two and as a result has clearer syntax for the former.

ie, the benefits of simplicity have a limit.


I don’t write many APIs but every JSON schema I’ve created had been automatically generated by openapi tools. Even then I’ve found schemas of very little use, because everything gets validated on deserialization anyways. Client side validation usually already taken care of in practice because users should be serializing using the same type library that deserializes or reading the docs very thoroughly.

JSON is so much more ergonomic than XML as the lingua franca because I can actually read it. That being said I still have my share of problems with JSON.


That was the cause of the XML problems - everything was generated.

Me? Schemas are a requirement in areas where you need to integrate over different technology / with different implementations. JSON Schema is in those contexts a bit of a kids toy compared to what XML can do.


We’re using Prisma (https://prisma.io) schemas for a particular data exchange project we’re doing so that we can generate JSON schemas, SQLite schemas, PostgreSQL schemas, etc. We have even found a generator to create basic Elixir code from the Prisma schemas.

We’re not using anything else from Prisma, but if we had to implement something else in JS to talk to a database, that would be a contender for our database interface layer (there are only a couple of others that are even remotely usable, having suffered through the disaster of a Sequelize implementation). We’re more likely to use Elixir and Ecto.


Adding to the problems of generated schemas, Microsoft and Sun both had different views on how they should be generated. I bought into the promise of "build a wsdl" and you can get clients from .NET and Java. I lost all of that buy in. Hard.

I don't know that I can lay the blame on either one of them directly, mind. But the industry definitely suffered from the bad faith cooperation of those companies.


Microsoft, Sun, IBM, HP, Oracle et al explicitly made WDSL and related technologies not interoperate... and that is where JSON + universe has been a delight.


Totally fair. Not sure why I limited my memory to just the two companies.

I'm not clear on how JSON as a format has helped interaction. I'm reminded of like efforts to standardize how information is stored on pages. By and large, that ship sailed and sites that have remained somewhat stable have driven how we look for information on them. All without having to add new schema languages or tools.


I can still read the generated JSON.


> because everything gets validated on deserialization anyways

First, it really depends what you're deserializing with. There is a lot of code out there that just does JSON.parse and then starts accessing the data and then you have an "undefined" get passed deep into the call stack where maybe it explodes or maybe the program just misbehaves. So if you're using a language like JavaScript or Python, then a JSON schema can be used to validate input right away. Think of it like enforcing a pre-condition.

It's also useful in cases where JSON is being used for configuration files. At my company we have quite a few places where JSON files checked-in to a git repo are our source-of-truth which then get POST'ed to an API. We can enforce the schema of those files using pre-commit hooks so no one even wastes time opening a PR that will fail to POST to the API. The same JSON schema is also used by the API to ensure the POST'ed data is correct.


> First, it really depends what you're deserializing with. There is a lot of code out there that just does JSON.parse and then starts accessing the data and then you have an "undefined" get passed deep into the call stack where maybe it explodes or maybe the program just misbehaves.

I disagree, this example is just sloppy programming. Passing unvalidated data deep into a program is bad, I'm not arguing for that. What I'm saying is that you should be converting your unvalidated serialized data into a structured type right on the edge. Your data type/type system should __be__ your schema/validator.

> So if you're using a language like JavaScript or Python, then a JSON schema can be used to validate input right away. Think of it like enforcing a pre-condition.

This is what I do with python+pydantic:

    @dataclass
    class Foo:
        bar: int

    foo = Foo(**json.loads(json_buff))
I'm not the biggest fan of pydantic here because you'll have to handle an exception for invalid data instead of an Option or Result in a better type system. But w/e.

> It's also useful in cases where JSON is being used for configuration files. At my company we have quite a few places where JSON files checked-in to a git repo are our source-of-truth which then get POST'ed to an API. We can enforce the schema of those files using pre-commit hooks so no one even wastes time opening a PR that will fail to POST to the API. The same JSON schema is also used by the API to ensure the POST'ed data is correct.

You can easily do with serdes and a type library as well.

---

I guess schemas may be useful for crossing language boundaries, but you're going to need language specific types/objects at some point so why use schemas directly even then? (I think gRPC may have code gen tools for this purpose).


JSON is great, but I surely wish it supported comments. That's the nature of its failings: too minimal.


That depends on what you want it to be. For a data interchange format, having no comments is arguably a strength. For a config file format, having no comments is a big weakness.


Just do

  {
    "someSetting": true
    "comment": "TODO change to false when ready"
  }
Though really text-based protobufs are better for config.


Problems are that some tools will rewrite the file and reorder the "comment" away from what it's meant to comment on. Also might complain the "comment" item isn't expected there. I seem to remember package.json suffering from both of these under the control of npm.


Yeah it's definitely a limited solution, and I prefer that JSON be kept simple that way. Many package.json-adjacent configs like the Babel stuff can be in .js files, giving you comments and everything.


Many people use text format protobufs for config. It vaguely resembles yaml, and the schema is enforced by virtue of requiring a message definition in order to parse it.


This always bothered me. A coworker once suggested using fields ending in 'notes' to put in comments but I never really warmed up to that.


I have heard that too but it’s just a terrible idea.


Luckily a good number of parsers support extensions to JSON like comments and trailing comma's.


Comments are simple to parse, but preserving them on the dump is complex. I guess they were sacrificed for the simplicity.


They were excluded so people wouldn't use them to insert meta-processing instructions into the JSON doc.

In reality people insert those meta-processing instructions in other ways.


That’s a good point. It would be hard to read the JSON, modify it and then write back with comments.

But you still should have the option to at least ignore them while reading. That would make JSON config files so much better to work with.


You can use JSONC, which is JOSN with C style comments.


The ambiguity difference around lists alone makes JSON over XML compelling.

It is simpler than XML/XSD. Without the schema, you never know if a certain element should be treated as being part of a list or not. When interoperating with anything other than XML, that matters.


I dislike SOAP and avoid working with it when I can. However, the WSDL is an excellent part of SOAP that really makes it easier to work with. Teams tended to over engineer their APIs and all kinds of cruft would develop. I like the HTTP operations with REST.

I can remember hardcoding and manipulating a bunch of non-sense legacy fields just to get a ticket created via their SOAP enterprise service bus. Not to mention all the operations that made no clear sense.


soap may have taken liberties with http to get its work done (graphql: so what??) but it really felt like we reinvented the wheel. i was consuming massive wsdls in 2013/2014 and i consume massive open api specs in 2023. did anything actually improve?


Unless the implementation defines too many things as just "Object" and you're consuming from a stronger typed language... and the generated library doesn't give you anything resembling a real interface. I've used a dynamic language (Node) a few times to bridge such wsdl/soap services to consume from C# and similar.

Consuming SOAP/WSDL from languages other than the one it's published in isn't fun. Man, some of the PHP implementations were beyond horrible... well defined REST/RPC +_JSON is generally much easier in the end.


> I dislike SOAP and avoid working with it when I can.

I disagree. I think personal hygiene is very important for in-office coworking.


> I dislike SOAP and avoid working with it when I can.

Well, I'm about to take a shower now, and shame on you.


What I appreciate compared to xml is:

  - generic concepts like arrays and maps
  - lack of opportunity to invent names
Every xml schema is a potential DSL that reinvents things they might now.

Other than that it's true that the xml era was just addressing a lot of important stuff early, I guess it was only compatible with big corp mindset and not early web dynamic / fluid / small scale apps. (a bit like how PHP started to write PSR to avoir dynamic code / effects in libs .. formalization etc.


Every JSON schema is also a potential DSL that reinvents everything. Yes, there seems to be some convergence on things, but object arrays in XML aren’t really any more complex than object arrays in JSON — there just might be multiple ways to represent them.

For this JSON:

    {
      "part_numbers": [1, 2, 3, 4, 5]
    }
You have two main ways to represent these in XML:

    <!-- repetition = array -->
    <order>
      <part_number>1</part_number>
      <part_number>2</part_number>
      <part_number>3</part_number>
      <part_number>4</part_number>
      <part_number>5</part_number>
    </order>

    <!-- wrapped repetition -->
    <order>
      <part_numbers>
        <part_number>1</part_number>
        <part_number>2</part_number>
        <part_number>3</part_number>
        <part_number>4</part_number>
        <part_number>5</part_number>
      </part_numbers>
    </order>
Is this better than JSON? No, not particularly. But it’s no less clear than the JSON, and it compresses pretty well (it compresses better for larger documents, obviously).

The larger problem with XML is that the tooling is often lacking outside of Java and C#/.NET and none of the tooling is well-built for the sort of streaming manipulation that `jq` does (it exists, but IMO one of the least usable ideas from the XML camp is XSLT), and JSON support is pretty universal everywhere, even if the advanced things like JSONpath and JSON Schema aren’t.

I also think that there’s a problem when you have to choose between SAX and DOM parsing early in your process. Most JSON usage is the equivalent of using a DOM parser because the objects are expected to be relatively small, but many XML systems are built for much larger documents, and therefore need to parse the stream because the memory use otherwise would be unacceptable. The use of a JSON streaming parser is much rarer, IME.


Where XML shines is when you pass more complex data types than numbers and strings. If you repeated your example for an array of dates, as an example, strictly speaking you can't even generate the JSON. We'd first have to agree on what string representation of a date we want to use. For XML it's built into the spec.


In JSON the de facto standard for datetime is (because of JavaScript) very much the Unix msec timestamp (which is always in UTC) so while it's not hardcoded in spec you basically need to be an idiot not to do it like that, and removes one huge headache of XML dates which is timezones.


I don’t think that I’ve ever seen msec timestamps passed around because JSON numbers are floats, which means that there’s a limit to the precision available (which is to imply as well that currency amounts should be passed as decimal strings in JSON for safety as well).

Suggesting that msec timestamps resolves timezone issues is naïve at best, because anytime you are passing something that refers to a real time (that is, it is significant to humans) rather than an instant time (that is, it is something like an event log timestamp), you are dealing with time in a particular place, which has human impact — cultural, legal, linguistic.

Passing around timestamps as RFC3339 UTC strings with timezone names and offsets (much like one should be doing in databases) is what would be recommended for real (human) times.


Okay, so the point at which you need to adopt a schema language in toy examples is earlier with JSON, but in most practical cases you’ll want to do that in either JSON or XML (because, even if you are only using built-in types, you’ll still want to communicate the shape), so this objection is kind of meaningless.


Well, no. Because JSON & by extension OpenAPI lack a Date type you can't easily add validations about dates to those schemas. Like you can't say this particular date must be in the past in an OpenAPI spec because it has no concept of a date. The best you can do is a regex on the strings you call dates but that falls apart pretty quick.


> Like you can't say this particular date must be in the past in an OpenAPI spec because it has no concept of a date.

I don't get these types of arguments.

There's zero reason you can't write code that parses a date in an expected format (and throws an error if the date is formatted incorrectly) and then checks that the date is in the past.

Yes, it does mean you'll spend time writing more code (You know, the job you're being paid to do?), and it would be nice if your data format supported such automatic checking functionality out of the box, but to say "It can't be done!" is just plain silly.


> "It can't be done!" is just plain silly

It's a good thing I didn't say that.

>Yes, it does mean you'll spend time writing more code

The whole point of WSDLs and OpenAPI is to minimize the amount of time it takes to consume your API. Saying you have to write more code is highlighting the shortcomings of OpenAPI at doing the only thing it's built to do. Which is why companies have largely punted on providing OpenAPI specs in favor of maintaining libraries in a handful of popular languages.


I have literally never used any of these things.

The hate I have for XML is the high markup overhead. Anybody who has configured a trunk of the century product with XML config files knows what I mean; the screen is usually 2/3 XML tags, which means 1/3 closing tags, which add nothing semantically


> but now we've got JSON Schema, Swagger/OpenAPI, Zod, etc etc. It's not really simpler and there's a lot of manual work - we might as well be using XML, XSD & SOAP/WSDL.

Uh... do we? I've never used any of those. Plain JSON has always worked fine for me.


> but now we've got JSON Schema, Swagger/OpenAPI, Zod, etc etc.

You don't have to use any of those.


You don’t have to use anything for XML, either. The simplest XML document is almost indistinguishable from the simplest JSON document. Nothing in XML requires XML schemas or namespaces or anything else that is usually attributed to the complexities of XML.


I have to say that I was bit disappointed the first time I learned about JSON Schema. My immediate reaction was to wonder if they were trying to become XML.


OpenAPI is complex not because of JSON, but because it's a nearly complete description of http.


… and have proper comments.


My favourite Douglas Crockford quite, from a debate back in 2006 about why JSON was reinventing the wheel when XML already existed:

> The good thing about reinventing the wheel is that you can get a round one.

https://simonwillison.net/2006/Dec/21/crock/


> Turned out JavaScript was the first language to give us lambdas, and that was an amazing breakthrough.

I mean... with charity I can see the context and get it. But. What!?

Overall fun read through history, even if definitely from Doug's perspective only. (As evidence by JavaScript being an originator of lambdas...) I do find the idea that JSON was as novel as history says it was kind of odd. I remember inlining javascript objects years before "JSON" was a thing. Making it a subset of what javascript could already do seems straight forward and a good execution. Getting rid of comments feels asinine to me. (I'll also note that the plethora of behaviors you get from JSON parsers shows that it is effectively CSV. Sure, there may be a "standard" out there, but by and large it is a duck typed one.)

I'm also a bit on the camp that XML is better than JSON. Being able to have better datatypes, for a start. Schemas that allow autocompletion. Is also easier to see as a markup language (per the name). That said, they clearly went too far with entities and despite making sense for markup, attributes versus children are more than a touch awkward.

I also recall that what killed XML and WSDL files in general, was the complete shit show that was getting a single document to work with both MS and non-MS clients.


Crockford mentions Scheme right before that, so he's aware lambdas originated with Lisp, presumably. I guess he means JS was the first mainstream language to popularize them?


Yeah, that is why I think I can see the point with charity to the discussion. Still an awkward proclamation. Many people were coding with LISPs for a long time before javascript came onto the scene. And I don't think LISPs were the only language with lambdas?


The current XML standard is hot garbage since it completely disallows null characters even via "&#0;" despite most languages now supporting nulls in the middle of strings. Also, JSON definitely allows schemas, primarily through the JSON schema standard, but I've also seen TypeScript notation used for this as well which has the convenience of being readable by more people (I strongly suspect more people know TypeScript than know either XML schemas or JSON schemas combined).


JSON is garbage to read largely due to how much needs escaping. This is largely fine for smaller documents, but there is a reason yaml and toml both gained traction over raw json for config files.

And I don't make any real defense of some of the darker corners of XML. In particular, I already criticized entities being a bit too much. Namespaces are also something that, while I can see the desire, the implementation is way too much for most of us.

JSON schema is going to be cursed for a long time. Just the odd treatment of it will be a problem. (In particular, that it is a subset of the numbers that javascript itself supports is... awkward.)

I also confess, though; that I'm not clear why I would want a null in the middle of a string? That feels like a gun loaded and aimed squarely at a foot.


Most languages (C#, Java, Rust, JavaScript, etc.) support nulls in the middle of strings so it can be a security vulnerability if you try to serialize untrusted input to XML. I'd much rather be able to encode anything my input language considers a string and deal with excessive escaping than need to worry about what I'm going to do with inputs that my serialization language cannot support.


I'm curious what the vulnerability is? Also not clear what the null character is. Any links I can follow?

And again, if this is your line in the sand, how do you serialize NaN and Infinity in JSON?

Edit: Playing with this a bit, I'd actually assume that allowing \0 would be a vulnerability. I was curious how browsers treat it, so I see that parsing to an html document seems to just drop the characters? Fun little rabbit hole to jump in!


Yeah, that's why I consider it to be a breeding ground for vulnerabilities. People will probably just assume the XML serializer can handle any strings in their language of choice and not handle those edge cases. What I ended up doing for my use case was to encode nulls as "&#0;" but within a CDATA section so it was interpreted literally (choosing ambiguity over omission). The best way would probably be to have some sort of spell <null /> element, but there isn't such a thing within the standard. There asi:nil, but that is really indicating something else.


But what is the vulnerability? And what is a null character doing in a text document?

If you are just worried about data loss, having null allowed in text segments is already begging for failure, as C programs will almost certainly get them wrong.

If you are transferring binary, base64 or similar will already cover you.

And again, if this is a strike on xml, how do you represent NaN in a JSON document? Do what DynamoDB does and wrap all numbers in quotes?


From another interview:

>The best thing we can do today to JavaScript is to retire it. Twenty years ago, I was one of the few advocates for JavaScript. Its cobbling together of nested functions and dynamic objects was brilliant. I spent a decade trying to correct its flaws. I had a minor success with ES5. But since then, there has been strong interest in further bloating the language instead of making it better. So JavaScript, like the other dinosaur languages, has become a barrier to progress. We should be focused on the next language, which should look more like E than like JavaScript.

- https://evrone.com/douglas-crockford-interview

One of the traits that makes Douglas great is being willing to say the obvious even if it is politically unpopular.


Oh, hey. That's cool. I hadn't realised Douglas Crawford worked on E. I haven't actually looked but I wonder who else participated?

E had some really cool ideas, it's sad that it doesn't seem to be that well known!


The biggest impedances I see to replacing JS are:

1. You've got to keep JS around for backwards compatibility for the billions of websites already using it.

2. You will need to two engine teams, one to maintain JS and one for the new language.

3. Now you have a whole new vector for security issues. You've made the threat surface much broader. So, you will probably need to hire additional people.

4. You need to coordinate with all the other browser makers so everyone rolls out their new engines more or less concurrently. Other than experiments, nobody is going to start using it unless it works on all the major browsers and platforms.


That depends on the language you choose.

If we went to a scheme dialect as originally intended, we could have just ONE language for all the things.

Legacy JS? Just compile it into Scheme and run it.

HTML? Use S-expressions and support legacy HTML syntax by compiling it into them. Now you get all the power people want from template languages, but baked right into main language itself.

CSS? No more weirdness like adding sin() or calc() to make up for shortcomings. Once again, you get the power of the full Scheme language right there.


While both are good fits for their specific use cases, I think JSON won as an medium of exchange because unlike XML, JSON is dead simple to parse and ingest programmatically.

What makes XML so unergonomic to ingest is 1) attributes, which don't map cleanly to a basic data structure that you might find in a programming language, and 2) namespaces, which are extremely, extremely tedious to program against.

Programmers are going to use the format that's the easiest to ingest and manipulate. JSON wins in that regard, hands down. Every time I need to write logic to ingest a namespaced XML document I heave a deep sigh and brace myself for another long week of fighting with LXML. But with JSON it's as easy as `json_decode($str)` and move on with your life.


Namespaces, schemas, custom elements, client side templating, XML has so much stuff that the web threw away, so now its forced to reinvent worse versions of it every few years, a shame.

Abandoning XML was the webs biggest mistake.


Whenever XML gets discussed here it's interesting to see what people complain about. In my completely unscientific assessment most things people hate(d) were the overwrought "Enterprise" uses/systems.

Very unfortunately for everyone XML came up at the same time as peak "Enterprise" moat building. No design pattern went unused everything was built with mind numbing "configuration". XML got used heavily in that space because it allowed massive "Enterprise Objects" (local branding varies) to be serialized in a way another system might have a chance to read.

Meanwhile the features you mention got thrown out with the bath water because everyone hated Enterprise style architectures. While I don't love, for instance, everything about XSLT it's built directly into browsers as native code. How many person hours, megabytes of JavaScript, and wasted CPU cycles have been spent reinventing client side templating using JSON? XSLT is already right there and will happily convert serialized data to your presentation format. You also get the ability to have comments in the data and a built in schema validation.

On my current project I'd much weather be emitting and consuming XML rather than JSON. But alas everyone hated Enterprise XML so we're stuck with JSON and the inability of some parsers to handle trailing commas and ambiguous definitions of numerics and not a comment to be found.


XML is oversized for the majority use case.

It's easier to extend a simple standard than to amputate a behemoth with unneeded appendages.


The problem with extending a standard is that there are so many ways to do it.


> And after years of being too early at everything, the world had caught up to Doug.

Have we though? Earlier, the article even has Douglas saying:

> It turns out it, well, it’s a multi paradigm language, but the important paradigm that it had was functional. We still haven’t, as an industry, caught up to functional programming yet. We’re slowly approaching it, but there is a lot of value there that we haven’t picked up yet.

I do love the very ending:

Adam: What do you think is the XML of today?

Douglas: I don’t know. It’s probably the JavaScript frameworks.

They have gotten so big and so weird. People seem to love them. I don’t understand why.

For a long time I was a big advocate of using some kind of JavaScript library, because the browsers were so unreliable, and the web interfaces were so incompetent, and make someone else do that work for you. But since then, the browsers have actually gotten pretty good. The web standards thing have finally worked, and the web API is stable pretty much. Some of it’s still pretty stupid, but it works and it’s reliable.

And so, when I’m writing interactive stuff in browsers now, I’m just using plain old JavaScript. I’m not using any kind of library, and it’s working for me.

And I think it could work for everybody.

------

Earlier in the interview where they were talking about how people behind XML and SOAP wanted complexity and were upset by the simplicity of JSON, I was thinking that this was resonating with me and how I feel about how complex web development has become with babel/webpack, transpiling, react/vue, etc. It feels like complexity for complexities sake.


One of the reason to prefer JSON over XML is that you can reasonably parse an untrusted JSON using default configuration without getting yourself pwned. A lot of XML processing libraries still support external entities by default that you have to disable them manually: https://cheatsheetseries.owasp.org/cheatsheets/XML_External_...


> you can reasonably parse an untrusted JSON using default configuration without getting yourself pwned.

If only this were true.

https://medium.com/r3d-buck3t/insecure-deserialization-with-...


I know that one, but I think JSON.NET is to blame for this because it decide to take `$type` and other fields and apply some reflection magic on it. It isn't really different from evaling a random json field in your own business code. A lot of sane json implementation also don't do this too, like `JSON.parse` `json.loads` `json.Unmarshal`...

On the other way, XML External Entity is a part of XML standard, so any standard compliant XML implementation have to support it. This is why XXE attack applies to many languages.


If it's not obvious, the issue is that standardizing a data format is going to have trade offs. Interoperability, leveraging tooling universally so all effort is going in the same direction, awesome. The problem is that some uses cases for the format are going to be insanely complex, which will make the standard and tools unnecessarily complex for the simple cases.

JSON is simpler and easier for many cases, but then you lose the interoperability. Go try to make an app right now dealing with Federal government systems or finance, you're going to end up translating JSON<->XML which isn't fun.

There's not going to be a silver bullet solution to this problem, it's not completely solvable.


> you're going to end up translating JSON<->XML which isn't fun.

Not fun? It's not even possible in the general sense.

If you have XML that looks like:

    <meal type="breakfast">
       <eggs count="3">
           <topping>cheese</topping>
       </eggs>
    </meal>
How would you convert that to JSON without knowing how the JSON consuming application expects it to be formatted? Where do you put the "breakfast" and "count" attributes?

You'd need to manually write a translator for each potential translation.


> You'd need to manually write a translator

Yep, therein lies the “not fun”. You write a bunch of super complex, brittle code.

Unfortunately because XML is entrenched in certain domains, you have to decide between writing these converters or doing everything in XML which also sucks, especially if you’re trying to write a modern app with a modern stack.


I remember one time designing the simplest and most readable data format ever and came up with Dixy [0] after removing all I could and still make it usable

I'm leaving it here because it will never be used for anything but at least it may inspire somebody design a better format with simplicity in mind

[0] https://github.com/kuyawa/Dixy


This looks a lot like YAML, especially with the non-quoted strings, colons, and indentation. It also seems to share the problems of YAML, namely a very non-uniform syntax. For example, how do you distinguish null (denoted as "?") from a literal string containing one question mark? How do you distinguish the number 1 from the string "1"? Hence why I'm not a fan of both YAML and Dixy.

Other problems to ponder: Is 0 different from 00? Is "1, 2, 3, 4" different from "1,2,3,4"? Is "a: b" different from "a : b" and "a:b"?


I like this! It's like YAML but you can learn the entire spec in 15 seconds.


Why will it never be used for any thing? I like it. Thank you for sharing.


"So Netscape thought they could do a similar thing for their navigator browser that, if they could get people programming in the same way that they did on HyperCard, on the browser, but now they can have photographs and color and maybe sound effects, it could be a lot more interesting, and you can’t do that in Java."

It's like the man never tried. Try a Java enabled browser: https://www.wikihow.com/Enable-Java-in-Firefox

Just as a reminder Minecraft (the most sold game in history) started out as an Applet.

Applets where not horrible because of the underlying technology, they where horrible because people made bad things with it, just like J2EE was a bad thing people made with J2SE.

But sometimes, rarely, people would make beautiful things with J2SE and J2ME and those are now removed from history forever under the banner of security like everything else that is good in life.


I've met Douglas a few times at JS Conferences, and he is an excellent engineer (read up on his work on the NES version on Maniac Mansion). However this passage about starting a company and trying to raise capital from VCs demonstrates that even excellent software engineers can be surprisingly myopic, dismissive, and naive about software businesses.

> Douglas: For me, the most difficult thing was raising money. You’re constantly going to Sandhill and calling on people who don’t understand what you’re doing, and are looking to take advantage of you if you can, and they’re going to do that, but you have to go on your knees anyway.

> I found that stuff to be really hard, although some of them I really liked. And sometimes I’d be sitting in those meetings and I’d be thinking, “I wish I was rich enough to sit on the other side of the table, because what they’re doing right now looks like a lot more fun than what I’m doing right now.” And it was even more difficult raising money then, because at this point, the.com bubble had popped and all VCs had been hurt really badly by that. So they were only funding sure things at that time, in late 2001, early 2002.

> And I thought we were a fairly sure thing, because we had already implemented our technology. And by this point, Chip and I understood the problem really well. And we had a new server and JavaScript libraries done in just a few months. And we had demonstrations. We could show the actual stuff. So it wasn’t like we were raising money so that we could do a thing. We had already done the thing, we needed the money so that we could roll it out. And that wasn’t enough for them. They wanted to see that we were already successfully selling it. And I was like, “If we could do that, we wouldn’t need you.”

Only they hadn't. They had built a demo of what we would later call a web 2.0 app. It wasn't even an application that solved a business problem or did anything specific. It was just showing the concept. That's not a product and that's not a business. The VC's point was: Show us proof that this idea has tangible benefits people will pay for.

The biggest misconception of VC's is that you raise money to "successfully sell" something you've built. You don't. You raise VC money to scale something that has value. So you need to communicate the business value, and ideally have proof-points (either in the form of sales, or data) that prove the value.

Of course Douglas found raising money difficult. But he doesn't seem to have the self awareness that this was probably due to him, and not the rich suits on the other side of the table.


For me 3 killer features of JSON are:

1. Parsing JSON doesn't require adding new firewall rules

2. There are no comments, so nobody will try to invent their own meta format or annotations in comments and instead they will put data in the JSON as they should

3. (When compared to JS) someone finally had the balls and picked one type of quotes, this makes making parser so much simpler.


Not supporting comments in JSON was a huge mistake. Yes I'm sure that someone, somewhere has once added comment directives to a file that caused issues. But that's such a rare problem compared to the very real and damaging and annoying problem of not being able to add comments to config files (hello package.json) that it's definitely the wrong choice.

XML supports comments and I have not seen a single use of comment directives in it ever.

I have seen plenty of comment directives in programming languages, HDLs and so on. But they are usually used as hints, e.g. to linters or to control compiler warnings, and they work perfectly well and cause no problems at all in my experience.

You might say that Crockford didn't anticipate JSON being used for config files. Fair enough. But now that it is, it should support comments.

My recommendation is to use JSON5 since it has a distinct file extension and fixes some other things about JSON too (e.g. trailing commas, hex constants) without being full on YAML insane.


directives can just be put in adjacent fields anyways


> There are no comments, so nobody will try to invent their own meta format or annotations in comments and instead they will put data in the JSON as they should

It also means it's worse format for configs where you sometimes need to annotate a few nodes with comments.


Yep.

"comment": gets littered across the JSON... or temporary changes are copied and the original property name is invalidate with a prefix. The simple structure is gone, replaced with adhoc workarounds.

Similarly when you want to use a type not supported by JSON such as datetime or binary data, you might end up with "type":"binary" and use base64 or whatever in the value (shoehorning attribs) - when it really needs a schema to follow during parse and stringify. Or OpenAPI, which is hardly lightweight and really doesn't match the simplicity of JSON.


For configuration files I add an extra key "_" and a string with the comment as the value. I even add multiple "_" keys to the same object and never seen something break.


This is just a terrible comment format isn't it?


JSON could really use schemas as part of the main implementation.

Local schemas, not crazy remote schemas.

Or some sort of way to bless an "official" schema format.


My biggest gripe with XML is that it can't represent arbitrary strings easily. Even in the latest versions of XML, you can't easily serialize strings with embedded nulls since it is forbidden by the spec to even use something like "&#0;". XML 1.0 was even worse since it doesn't allow any characters which require surrogate pairs under UTF-16. Instead, the spec writers apparently expect devs to come up with their own escaping scheme in which case why bother having a standard at all?

Even C# just punts on this issue and won't emit valid XML if a string you serialize happens to have a null character in it.


If I had to deal with strings that XML won't allow, I'd probably just rely on encoding the data in Base64 before throwing it into the XML.

A human won't be able to read it (Unless you're crazy and have learned to read Base64), but the application still can easily. You'll just have to add a Base64 translation step before/after serialization/deserialization.


It's very annoying to do that though since that introduces a bunch of logic in the application and also removes the benefit of being able to read the strings in the XML as a human.


I worked on a customized ejabberd at a company for years, drinking all the XMPP kool-aid and becoming very familiar with XML along the way. Slowly we all began to realize how bad XML was. We eventually put our custom extensions' data into JSON just embedded inside the XML. Says a lot that such a hack was actually an improvement.

The other two premier XML use cases I can think of are

1. RSS: Last time I did this, ironically I built the payload with a JSON-API'd lib that deals with the XML drama for me. Worked fine.

2. Configs. Rarely are these done in XML anymore. Human readability matters for configs. But there are also better options than JSON for this.


3. HTML-like things where XML actually makes sense cause you're defining some sort of document with reusable objects that gets rendered at the end.


I bought one of the first books about XML, read it cover to cover; started writing my own parsers and generators, designed a custom XML protocol for a network server at work.

Then I had to live through the whole SOAP-drama, and Java EE; and ended up promising myself to never touch it again.

It has too many degrees of freedom for its own good, the C++ of data formats.

JSON is in many ways the other end of the spectrum; simple but underspecified and painful to deal with in anything but JS.

I often dream of something in-between.



What if I hate both formats? XML is overly verbose, while JSON isn't specific enough or precise enough for a lot of my needs.

- This message brought to you by TOML gang


I just checked out the spec, and it gets pretty ugly in the Table section. A lot of the json examples are both shorter and IMO more precise. Stuff that’s not allowed with [table] is allowed with [[table]], and it’s confusing to understand what level of depth I’m at.

I’ll take edn over any of “em. https://github.com/edn-format/edn

Comments and time stamps allowed, arbitrary nesting of data structures, make your own tagged literals if you need them. And commas are whitespace, mostly unnecessary.


I've got yet another markup language for your hate group's target list :)

Come join the dark side where we enjoy the wonders of binary formats such as avro and protobuf.


I actually love binary formats, especially for network communication. We probably waste tons of processing power and network bandwidth needlessly sending JSON back and forth everywhere and re-deserializing it. I'm personally a fan of MessagePack.

Though for something where you want human readability it's hard to beat TOML in my opinion.



Absolutely agree. TOML is far and away the best for config files.


You can't be serious; I can't stand having to guess what kind of crazy markup is required to express things in toml. As a concrete example I converted my local kubeconfig (which is yaml) and here are the completely random characters indicating some kind of hierarchy

    apiVersion = "v1"
    current-context = ""
    kind = "Config"

    [[clusters]]
    name = "my-cluster"

    [clusters.cluster]
    certificate-authority-data = "LS0tL..."
    server = "https://example.com"

    [[contexts]]
    name = "context0"

    [contexts.context]
    cluster = "my-cluster"
    user = "my-user"

    [[contexts]]
    name = "context1"

    [contexts.context]
    cluster = "my-cluster"
    user = "my-user"

    [[users]]
    name = "my-user"

    [users.user]
    [users.user.exec]
    apiVersion = "client.authentication.k8s.io/v1beta1"
    args = ["eks", "get-token"]
    command = "aws"


As the other person who replied to you noted, a converted-from-yaml file wherein the service is designed to use yaml, not toml, it hardly a good example of toml. Of course your example is bad.

At least use a native toml file as an example.


It is really just arrays of objects/dicts that get complicated with TOML. For dictionary properties not inside an array, you can either fully specify the path `foor.bar.baz = 7`, or use a header like `[foo.bar]` and specify `baz=7`.

Also if I was handwriting that I would probably make more use of doted property names implying dictionaries like so, which though it has a little bit more repetition in property names, seems easier to read:

    apiVersion = "v1"
    current-context = ""
    kind = "Config"

    [[clusters]]
    name = "my-cluster"
    cluster.certificate-authority-data = "LS0tL..."
    cluster.server = "https://example.com"

    [[contexts]]
    name = "context0"
    context.cluster = "my-cluster"
    context.user = "my-user"

    [[contexts]]
    name = "context1"
    context.cluster = "my-cluster"
    context.user = "my-user"

    [[users]]
    name = "my-user"
    user.exec.apiVersion = "client.authentication.k8s.io/v1beta1"
    user.exec.args = ["eks", "get-token"]
    user.exec.command = "aws"
If k8s was designed with TOML in mind, it probably have been structured differently, such that "Contexts" for example might be just a dictionary mapping names to an object that has the values from the "context" property (The existing pattern of an array of objects where each object has a name, but store most of their properties in a property whose name matches the object's type is already weird, but doesn't look terrible in yaml.)

Such a redesigned to be a more TOML friendly schema would then look like this:

    apiVersion = "v1"
    current-context = ""
    kind = "Config"

    [clusters.my-cluster]
    certificate-authority-data = "LS0tL..."
    server = "https://example.com"

    [contexts.context0]
    cluster = "my-cluster"
    user = "my-user"

    [contexts.context1]
    cluster = "my-cluster"
    user = "my-user"

    [users.my-user.exec]
    apiVersion = "client.authentication.k8s.io/v1beta1"
    args = ["eks", "get-token"]
    command = "aws"


Json is just hipster xml. Jq is just hipster xslt.

Somebody should add a json entry to "the ascent of ward" [0]. Of course, it will be longer than all the previous versions combined, and the fields will appear in random order because dictionary.

[0] http://harmful.cat-v.org/software/xml/


To me, Douglas Crockford is the unofficial grandfather of JS. He is amazing, and I love hearing him speak!


> The success of JSON was totally serendipity. Getting the domain name definitely helped. There are some things that I didn’t do that definitely helped. I didn’t secure any intellectual property protection on it at all. I didn’t get a trademark for the name or for the logo. I didn’t get a copyright for the specification. I didn’t get a patent for the workings of the format. My idea was to make it all as completely free as possible. I don’t even require any kind of notice. No one has to say, “Thank you, Doug, for doing that.” It’s just free for everybody. And I think that definitely helped.


I'll never understand the hating that xml tends to get around here.

Choose the right tool for the job at hand. Sometimes json is the right choice, sometimes xml is. Not everything is a webapp.


It's an ugly tool. People generally hate ugly tools


based on the overwhelming majority of the top 30 comments, i think you should feel comforted.


What a pointless debate. I've worked with XML manuscript archives and I can be certain that if I'd had to do it in JSON I'd have killed myself.


This is about JSON being created or discovered and Doug struggling to convince people it was relevant when everyone was so bought in on XML.

Are you saying you think JSON shouldn't exist and everyone should use XML for everything?

Tooling around XML was certainly more established, but man there was a lot of complexity built up around it.


No. JSON is great as Javascript's serialization format, but it's not as readable and robust as XML, period.

I use both extensively, and for bigger objects and definitions, XML is a very clear winner.

I'm a big believer in horses for courses type of approach, and my personal gripe is the push to replace one thing with another. These data types can coexist, and can be used where they shine. XML can be read and written stupidly fast, so it's way better as a on disk file format if people gonna touch that file.

YAML and JSON are not the best fit for configuration files. JSON is good as an on-disk serialization format if humans not gonna touch that. XML is the best format for carrying complex and big data around. TOML is the best format for human readable, human editable config files.


My only quip is both are basically unreadable in most use cases. Most programs worth anything that use these formats usually strip out all the extra spaces and formatting. You usually have to take an extra step to 'reformat' just so you can read it. And anyone who has had an open paren or carrot or missing could show how painful manually parsing a 400+ field one of these is. Trying to say one is better than the other ignores the use cases for both. One being good at slugging data into javascript/python. The other being good at light typing, annotation and transform.


I never seen a tool which stores its XML config in a minified/uglified form by removing whitespace. The biggest two tools I play and which use XML are Keycloak and Eclipse, and none of them do this.

All of the parsers I used, and editors I have edited XML always shown the correct place where a caret is missing or XML is broken in anyway, so I have never hunted anything down inside a big XML file.

However, this doesn't invalidate your experience about unreadable XML files, which are most definitely present in the wild.

However, I agree that none of them are good config file formats, but storing data, I'll take XML all day, every day (except when I really need a binary file format, e.g.: for compressing data).


What specifically about XML means it can be read/written "stupidly fast"?

It's still a text bound serialization format, you still have to parse a tree for it.

Is it just particularly mature libraries?


It is primarily mature libraries, but also XML is more straightforward to parse, because there are not many data types and tags makes it very deterministic.

By "stupidly fast", I mean I can read a 120K XML file, parse it, create the objects which generated from that file definition under 2ms. The library I use (RapidXML [0]) can parse the file almost with the same time cost of running strlen() on the same file. That's insane.

[0]: https://rapidxml.sourceforge.net/


Being a maintainer of the fastest XML library for Rust, I strongly disagree that XML is inherently fast to parse, and I question any such claim which comes with no evidence. Especially when it has remained unchanged on their page since (at least) 2008 [0]. Have you actually tested that claim or are you taking it at face value?

IME the XML spec is so complex that you either end up with a slow but compliant parser or a fast one that doesn't implement the spec completely.

JSON, unlike XML, is minimal enough that writing an entire compliant parser with SIMD intrinsics [1] is actually practically feasible. That library claims 3 GBps parsing speed, which could theoretically process your 120kb of data in 1/25000th of a second instead of 2/1000ths of a second.

I would wager that JSON is faster to parse, on balance.

[0] https://web.archive.org/web/20080209172554/https://rapidxml....

[1] https://github.com/simdjson/simdjson


YAML is excellent as a post natal abortion mechanism. Anyone working on its parser will question why live when YAML exists. Source: I'm developing a YAML parser.

What broke me were: plain string and empty node handling.

Here is a fun quiz. Which of these two documents or both or neither are valid. With explanation ofc.

Yaml#1

     :
Yaml#2

    :


XML is great except at being a configuration format, a messaging format, a serialization format, or any other purpose really. It's not insane like YAML I'll give it that. I'll take XML over that garbage any day.


The Complexity of XML reminds me of something from Adam Bosworth's ISCOC04 Talk [0]. To me, the big takeaway is that HTML succeeded because of it's limitations, not despite of them. JSON seems very simple compared to XML. XML seems to be very powerful, but also very complex - it's like, if all you need to do is pick your kids up from Soccer Practice, you don't need the powerfullness (complexity) of the Space Shuttle in your vehicle.

  In 1996 I was at some of the initial XML meetings. 
  The participants� anger at HTML for �corrupting� 
  content with layout was intense. Some of the initial 
  backers of XML were frustrated SGML folks who wanted 
  a better cleaner world in which data was pristinely 
  separated from presentation. In short, they disliked 
  one of the great success stories of software history, 
  one that succeeded because of its limitations, not 
  despite them. I very much doubt that an HTML that had 
  initially shipped as a clean layered set of content 
  XML, Layout rules – XSLT, and Formatting- CSS) would 
  have had anything like the explosive uptake.

https://adambosworth.net/2004/11/18/iscoc04-talk/


But you don't have to use any more of the XML-related standards than you want to. You can ignore schemas, and add-on technologies like XPATH and XSLT and just use XML as a hierarchical tag-value format, just like JSON.

At this level they are both about equal in complexity: JSON has data types that XML doesn't, and XML has attributes and CDATA that JSON doesn't. JSON syntax is more succinct, but XML syntax is more regular.


XML is good for documents that don't have a regular markup (XHTML, DocBook, JATS, MathML, etc.) where you can mix content elements -- e.g. italic annotations.

JSON is good for structured data/records such as serialized data structures found in RPC protocols.

They both have their own pros and cons that make them suited to different use cases. Choose the one that best suites your data model and use cases.


Debate? Did you even read the article? It was about the history of how JSON came around. I didn't read a debate (despite what the title implies).


Honestly, I would relegate XML to application configuration. Trying to communicate with it with something like HTTP requests/responses is absurd.


I just had to comment on the irony of this comment being embedded in a document that is delivered via HTTP and very close to valid XML.

Even if XHTML died on the wayside, HTML is imho a stereotypical example where XML is a good fit. Most of the complexity has valid use cases, and it's mostly obvious what should be an attribute and what should be content of the tag. And at least in HTML 4 you even had a doctype tag filling the role of specifying the schema used. Of course SVG is a better showcase for some other aspects of XML, with every editor putting their own metadata in, nicely partitioned into separate namespaces.


In broad strokes, I suppose you're right to see the irony. Even with that, we need specific client applications (aka browsers) to translate that into something readable.


(X)HTML can be pretty readable, if it's formatted well. If it wasn't, JSX wouldn't be a thing.

I think it's not so much about readability but about complexity. XML is meant to represent complex data, like complex rich text or nested vector graphics. That makes XML complex, conceptually, visually, and in implementation. If you use it to represent something that could have been a csv you're going to have a bad time (as everyone had in the 90s).


> HTML is imho a stereotypical example where XML is a good fit

Indeed, this was what XML was created for. From W3C's XML specification:

> The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.

Honestly, what's absurd is GP comment's cluelessness.


When you can get the data as XML, verify via its schema externally and then transform it via XSLT, it's not.

Also, it's way better in transferring/storing big, complex intricate data like 3D objects.


I remember playing with an invoice data in XML sending it via email and opening it in a browser being beautifully shown in all it's visual glory using an XSLT directive in just one line at the top of the data file. Absolutely amazing. I wondered all the implications and applications of just transmitting data that knew how to present itself to the user


> Also, it's way better in transferring/storing big, complex intricate data like 3D objects.

Curious how come?


I have a project which can work on 3D objects imported from STL files. My file format has more metadata and much more detailed information w.r.t. a standard STL file, and all the data and the metadata can be written in a way which is both readable and modifiable by a human if needed be.

Having the same tags many times means the file can be nicely compressed, it's being XML means it can be verified independently with a schema (and the schema can be defined as a remote location over HTTP if needed be), too.

You can always store data more efficiently with binary formats, but XML DOM parsers allows to access arbitrary parts of the tree instantly, so working with it is both easy and fast at the same time.


Thank you!


Agreed. That and knowledge representation (ontologies) is another good use case for XML, since JSON can't natively represent attributes (has-a relationships).


I'm in a similar sphere with XML and ontological representation. I've inherited maintenance of an ontology (of sorts) that has been used in social sciences since the 1940s. Can I ask what domain you are in? How do you like to represent your ontologies? SKOS?


Oh, I'm not using ontologies at all hah. Just interested in the idea of knowledge representation.


Doug says there's not a conservation of complexity, but I kind of disagree with that - the problem he was getting hung up on back in 2000 was that the original XML complexity was frickin' useless but the consultants and the capital were trying to keep it around anyway. If you don't know why a complex condition exists, you can't abstract the complexity away.


> Douglas: [...] It doesn’t look like it should be complicated. It’s just angle brackets, but the semantics of XML can be really complicated, and they assumed it was complicated for a reason.

> Adam: [...] He also wanted people to use JavaScript properly – use semicolons, use a functional style, don’t use a vowel, use JSLint and so on.

They could have done the same with XML, i.e. define a simple-XML subset without schema, CDATA, entities, etc. Instead they built it on top of another language that is so infamous that they felt the need to write JSLint.

> Adam: The thing they came up with, Doug’s idea for sending JavaScript data back and forth, they didn’t even give it a name. It just seemed like the easiest way to talk between the client side and the backend, a way to skip having to build XML parser in JavaScript.

So the original reason was that they could use eval(jsonstr)? Because of the security implications they better had written a JSON parser. At that point, is it any better than writing a simple-XML parser? At least, that would have saved them from the "it's not a standard" discussions.


a lot of people started programming in this thing and were writing in a style of programming that the professional programmers of the day thought was impossibly hard, which was doing stuff based on events.

Not so different from today. That quote is about HyperCard, not JS, by the way.


I really hope that one day CUELANG will catch on to generate and validate JSON.

The current state of JSON generation/validation is simpler than the XML ecosystem, but a bit hackish.

We can have a much better stack.


> Oh, I did that. I didn’t intend to fight the federal government, sorry.

Seems politeness goes a long way when you're facing federal charges


I can't listen to this because the host sounds like a text to speech engine.


I read it. It's a quick read.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: