Because it was popular, especially in web-development-adjacent circles back then...

michaelt · on Feb 12, 2021

> Also, your example is rather strawman-ish: there is no point in using separate elements for individual coordinates, and anyone designing the format would know this.

You say that - but here's some real world XML from the widely used 'GPX' file format[1]

      <trkpt lat="47.644548" lon="-122.326897">
        <ele>4.46</ele>
        <time>2009-10-17T18:37:26Z</time>
      </trkpt>

The truth is I could have put the X coordinate as an attribute and the Y coordinate as a child element and it would still have been a fair representation of real-world XML documents.

And GPX is one of the better XML formats! You want to see nightmare XML? Go look at SAML.

[1] https://en.wikipedia.org/wiki/GPS_Exchange_Format#Sample_GPX...

lolinder · on Feb 12, 2021

These are still examples of crappy DSLs that use XML syntax, not specific problems with XML itself. Designing good DSLs is a hard problem, but is it any less hard in JSON or YAML? If so, it would be helpful for me if people could provide specific examples of why XML itself leads to poor DSLs. As it stands, I'm left wondering if there are so many bad XML DSLs simply because XML is (has been) a popular format.

gugagore · on Feb 12, 2021

Pertinent to this specific discussion is that XML distinguishes syntactically between "elements" and "attributes", but it's often not clear what the purpose of the distinction is (search for "elements vs attributes").

JSON and YAML do not make such a distinction.

lolinder · on Feb 13, 2021

True, but the lack of a clear distinction also seems like a weakness in the specific DSLs rather than a weakness in the format. HTML has (with some exceptions) a very clear distinction between them: attributes are for metadata, elements are for visible page content. Android's XML draws a similar distinction, with attributes used to describe properties of a view and elements used to describe child views.

(I do concur with others who have noted here that the "elements" versus "attributes" distinction makes XML a poor choice for serialization, but XML as a serialization format isn't really the issue here.)

DonHopkins · on Feb 13, 2021

XML was based on SGML, which was originally intended for marking up text documents, so TEXT was given a special place in the Pantheon of Nodes, even above Attributes.

So using XML to represent data structures instead of marking up text can be pretty awkward, inefficient, and nuanced.

XML Attributes are second class citizens compared to TEXT nodes which can contain CDATA, because attributes undergo "Attribute-Value Normalization" -- having their line breaks, entity references, and white space normalized. Newlines are normalized, leading and trailing white space removed, repeating white space replaced with a single space.

SVG path attributes (as well as simple values like numbers, booleans, enums, etc) are impervious to Attribute Value Normalization corruption, because they don't depend on white space being perfectly preserved (by design, of course), so they are fine to put in attributes.

But if you really care about preserving the exact value of a string, like a password or arbitrary string, you should use <!CDATA[[ ]]> in a text node, not an attribute!

I Wanna Be <![CDATA[ https://donhopkins.medium.com/twenty-twenty-twenty-four-esca... ]]>

https://www.w3.org/TR/xml/#AVNormalize

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.

Begin with a normalized value consisting of the empty string.

For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

For a character reference, append the referenced character to the normalized value.

For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.

For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.

For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read should be treated by a non-validating processor as if declared CDATA.

It is an error if an attribute value contains a reference to an entity for which no declaration has been read.

DonHopkins · on Feb 13, 2021

Some of the most incredibly awfully bad XML DSLs are official standards, themselves. COUGH XMLSchema COUGH

You should read some of James Clark's criticisms of the official XML Schema (XSD) standard, which motivated him to develop TREX (Tree Regular Expressions for Xml), which he combined with Makoto Murata's RELAX (REgular LAnguage description for XML) to create Relax/NG.

https://en.wikipedia.org/wiki/James_Clark_(programmer)

https://en.wikipedia.org/wiki/Makoto_Murata#RELAX_and_RELAX_...

>Some people, including Murata and James Clark, had critical attitudes toward XML Schema. XML Schema is a modern XML schema language designed by W3C XML Schema Working Group. W3C intended XML Schema to supersede traditional DTD (Document Type Definition). XML Schema supports so many features that its specification is large and complex. Murata, James Clark and those who criticised XML Schema, pointed out the following:

>It is difficult to implement all features of XML Schema.

>It is difficult for engineers to read and write XML Schema definitions.

>It does not permit nondeterministic content models.

>Murata and collaborators designed another modern schema language, RELAX (Regular Language description for XML), more simple and mathematically consistent. They published RELAX specification in 2000. RELAX was approved as JIS and ISO/IEC standards. At roughly the same time, James Clark also designed another schema language, TREX (Tree Regular Expressions for XML).

>Murata and James Clark designed a new schema language RELAX NG based on TREX and RELAX Core. RELAX NG syntax is the expansion of TREX. RELAX NG was approved by OASIS in December 2001. RELAX NG was also approved as Part 2 of ISO/IEC 19757: Document Schema Definition Languages (DSDL).

https://en.wikipedia.org/wiki/Regular_Language_description_f...

https://en.wikipedia.org/wiki/RELAX_NG

https://en.wikipedia.org/wiki/XML_Schema_(W3C)

Schema Wars: XML Schema vs. RELAX NG (1/2) - exploring XML

https://web.archive.org/web/20180429143242/http://webreferen...

https://web.archive.org/web/20180429145711/http://webreferen...

https://news.ycombinator.com/item?id=22756875

>James Clark used Haskell to design and implement an algorithm for validating Relax NG XML schemas (he co-designed Relax NG, and designed its predecessor TREX), to work the ideas out before re-implementing it in (many many more lines of tedious brittle) Java (JING). Haskel works wonderfully as a design and standard definition language, that way.

https://news.ycombinator.com/item?id=25435678

>James Clark's compact syntax for Relax/NG XML schema validation language is quite tastefully designed, an equivalent but more convenient alternative syntax than XML, for writing tree regular expressions matching XML documents. It's way more beautiful and coherent than the official "XML Schema" standard.

[...]

>There's a wonderful DDJ interview with James Clark called "A Triumph of Simplicity: James Clark on Markup Languages and XML" where he explains how a standard has failed if everyone just uses the reference implementation, because the point of a standard is to be crisp and simple enough that many different implementations can interoperate perfectly.

>A Triumph of Simplicity: James Clark on Markup Languages and XML:

https://web.archive.org/web/20130721072712/https://www.drdob...

"The standard has to be sufficiently simple that it makes sense to have multiple implementations." -James Clark

pwdisswordfish6 · on Feb 13, 2021

Fair enough. But SVG was defined by the W3C, the very same organisation that standardised XML itself. If anyone, they would know what they’re doing.

wtetzner · on Feb 12, 2021

> And there weren’t many open-standard extensible general-purpose structured data formats.

The thing is, just having an XML parser isn't enough to parse SVG. You also have to parse a DSL. So I think the argument is "why not just use only a DSL, that does a good job of describing the data model?"

gugagore · on Feb 12, 2021

The boundary where the XML ends and the DSL begins is at least a little arbitrary, but it does have important ramifications. By using XML, you can use the XPath query language to index down to a `path` (the end of the XML), but not within a `path` (the beginning of the DSL). That means, at least conceptually, you can attach colors, and events, and such to a path. And the fact that you cannot do it within a path (not without introducing a query language for the DSL) is inelegant, but also might not have much bearing in 99% of applications.

pwdisswordfish6 · on Feb 12, 2021

Because then you’d have to define its syntax and extension points. XML and namespaces solve that problem for you, so the only thing left for you to design is the actual structure of your data.

wtetzner · on Feb 12, 2021

Except that's not what happened. They also had to define new syntax for the DSLs they embed in strings.

pwdisswordfish6 · on Feb 12, 2021

True. But it would have been much harder if they had to define a syntax for the entire document.

I imagine they thought that the path DSL is simple enough to parse (and I seem to vaguely recall PostScript has something similar), while the overhead of representing path nodes as XML elements would be too high.

wtetzner · on Feb 12, 2021

I'm not arguing against using an existing format, BTW, just seems like a weird mixture here.

I can't help but think s-expressions would have been a better choice.

foolmeonce · on Feb 12, 2021

I think its relationship with HTML, JavaScript and CSS would be a lot worse if it didn't reuse XML to the extent that it did.