A. It is often said that one of the advantages of SGML over some
other, proprietary, generic markup scheme is that "nobody owns
the standard". While this is not strictly true, the ISO's pricing
policy certainly has helped to keep the number of people who do own
a copy of the Standard at an absolute minimum.
[ Ed. note: I'm not exactly sure why this is seen as an advantage,
it's just something people say. ]
I don't know about SGML, but in XML attributes are subject to (mandatory) Attribute-Value Normalization whereas white space normalization for elements can be disabled if you have control over the parser.
Both (elements and attributes) are unsuitable to store arbitrary data in a straight forward manner. If you absolutely must, you should know about normalization, XML-whitespace handling and CDATA.
If you find yourself torturing your head about whether to use elements or attributes, then you're attempting to use a markup language as a general-purpose data representation format, something it wasn't designed for.
SGML and XML are for text, optionally marked up with tags/elements. Attributes are for data about element presentation, and not intended to be displayed directly. It's as simple as that.
True, but SVG's saving grace is that it's designed to be used embedded in HTML (or in XHTML back then). But it doesn't make a lot of sense otherwise. Eg. take SVG2's switch to represent drawing order in a z-index-like property, as opposed to SVG's original "painter model" where drawing order is determined by document order. Basically, SVG doesn't make any use of distinguished markup features such as element content (text elements being the only exception) and document order.
Usage of XML in business data and non-text document formats OTOH is an accident IMHO, but is still the most robust format for data exchange and archival in long-term commitment scenarios we've come up with so far, and I don't see that changing anytime soon because there aren't that many open standards being developed anymore.
Does SVG offer any advantages over, say, EMF/WMF? I know a strong advantage of WMF is that the file format translates 1:1 into GDI calls, which makes it very fast - but I don't see EMF files rendered with anti-aliasing or complex gradients. What about PDF or PostScript?
SVG would actually be great if it wasn't such a feature creep. Filters, masks, scripts and animations are supported only by some renderers, often inconsistenly. This does not seem to stop SVG generators producing documents using these features, often unnecessarily.
Good SVG can even be human-readable and -editable. I've actually fixed simple broken SVGs with vim and a pocket calculator.
PostScript is great but as a binary format not very modern, where text-based formats seem preferred. I had to look up WMF and dumping calls to Microsoft API as an 'open' standards does not sound too exciting, either.
You are right on both counts! I seem to have mixed it with PDF.
Sadly, programming is coming to SVG as well with more and more renderers supporting JavaScript. Which might be great for some use cases, but again even further disperses the field of possible generator/feature/renderer combinations that might (will) fail.
Oh dear god no. Trying to parse EMF+ is a bit of a nightmare, and it’s not just 1:1 GDI calls, there are dual EMF/EMF+ files, it’s stack based (I suppose not so bad), with klydge after kludge (especially around text)... we use it because we have to.
Simplicity of implementation. GDI's drawing primitives exist in practically all other drawing APIs so translating it to Cairo, OpenGL, or Direct2D should be trivial.
> I'm confused. SGML/XML is capable of representing arbitrary tree structures (with metadata attached to the nodes), is it not?
So is ASN.1, but nobody in their right mind would use ASN.1 as a markup
language. The sole fact that it is possible to cram the structure into the
format doesn't mean it's a good idea.
Wait. A tree is just nested tags, right? I feel like I'm missing something.
Here I'm thinking of the correspondence between trees and strings of nested parentheses. So, strings like
(a (b c) d) (e f)
carry a tree structure, and any (non-associative) multiplication is just a fold operation over a tree. In this case it's like we tag each set of prentheses with a name/operation, so naively XML would seem to naturally represent this kind of thing.
I have little actually experience using XML directly though, so am genuinely curious as to what so terrible or "crammy" about my ideas here.
There's a lot of weird ideas floating around in the comments on this link. I don't think you are on the wrong track.
There is some mismatch between what XML was designed for and the problems XML is good at solving. It was most certainly designed to be easy for humans to read and write manually. In practice, it is a great interchange format, by which I mean the specific idea of different parties writing XML for exchange between each other, because it can be validated mechanically. There is a large and powerful ecosystem of software that has sprung up around it which simply isn't there for s-exps or JSON.
Elsewhere on here, marcoperaza points out that it is kind of against hacker ethic. That's true. But enterprise software often involves multiple separate organizations having to agree on what a document can contain, and XML is great for that, and that use case tends to be more valuable in industry than whether it is the tersest, most flexible or readable format.
and really the rule of thumb about attributes vs elements is more about if you are going to need chikdren of item - easier to extend elements with child nodes..
Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters. The intention is to be inclusive rather than exclusive, so that writing systems not yet encoded in Unicode can be used in XML names
I'm genuinely perplexed at that animosity towards XML. In that second link I'm unable to find any substantive problem other than that "XML endtags make it too verbose". That seems like a legitimate thing to worry about when considering serialization formats, but where is all the vitriol coming from?
Beyond basic usage, XML is pretty complex: schemas, namespaces, XSLT, XSD, DTD, XPath and XQuery.
There are plenty of good arguments for the XML way of doing things. For example, having a rigorously defined way (XSLT) to specify transformations of schema-conforming XML is more robust than ad-hoc code that wrangles schemaless JSON.
But it does go against the hacker ethos and stands in the way of rapid development. And wherever it is used, complexity and verbosity seem to often follow. Look at SOAP, for example.
Yes. As a tree structure, basic xml is fine. There are lots of other ways to do it, but there's really not much difference. For communication, agreement on a common language is more important than its intrinsic merits. (and xml partly attained that because it looks like html, which was still new at the time).
But the xml ecosystem is horrible. Sensible ideas, horrific execution; like namespaces and schema. Probably the single worst problem was using xml syntax itself: it's like, a programming language that uses JSON for its syntax.
But also, there's guilt-by-association, people hate the enterprise culture that uses xml - similar happened to java.
Though xpath is not so bad, and many people seem to quite like it.
Finally... json is a better match for data, basically by being c-like. However, an ecosystem tumour is also growing, around JSON. Some even use json syntax itself...
I wonder, if perhaps, a root issue is that the world is complex, and youthful simplicity is corrupted as it adapts to cope with the real world...
There is hope, however; tools like `jq` never existed for xml.
JSON and XML aren't really equivalent though. XML is a markup language, JSON is a data serialization format. Of course a lot of hate for XML comes from people trying to use it to serialize data, which it can do but only in a clunky way. JSON is just plain better suited for what most people need.
Thank you (and @marcoperaza) for unpacking things a bit. I was mostly unaware of the sideband technologies that go along with the basic tag and attribute structure, and your comments about the community and "guilt by association" got me thinking about library support and other practicalities which I hadn't been considering. Thanks :)
This also got me reading up on various structured-data formats: XML, YAML, JSON, TOML, HCL, etc. I'd really like some big table comparing various features but can't seem to find anything of the sort.
I found a link [0] that has comparisons between JSON, TOML and YAML representations for various types of data. It's neat to see how each becomes more or less verbose depending on the kind of data getting encoded.
JSON should only really be used in a machine-to-machine context i.e. for serialization. If you want something that's easily editable with a text-editor then use YAML. XML is horrible for hand-editing, but is easier to read than either YAML or JSON and is super flexible, and has heaps of support in terms of tooling and libraries, and is well understood, so it's great for information interchange. Issues around verbosity aren't a huge issue for machines and again this verbosity is actually helpful when doing data-integration. Concerns about wasting space are spurious since any redundancy in data can easily be done away with with some basic data compression, and as somebody else mentioned, xpath is actually pretty good, and makes the verbosity actually a net positive for certain applications.
I think `jq` is more powerful, but for those afflicted with xml instead of json, there is `xmllint`. In a pinch, one can use it to hack something together (along with xpath and xslt).
XML is fine if all you want is a human-readable format to define tree data structures such as documents to be used in applications where only strings are used and someone within the use case needs to have the semantics of each node and each attribute spelled out quite clearly and unequivocally in the document structure itself.
For any other case, XML is horrible.
Now, consider that XML is used quite extensively in any other case beyond tree-based DOM data structures that it was designed for.