XML definitely adds features over sexps, and is good for adding markup to primarily text-based documents, where the concept of bare text has an intuitive meaning, and elements denote a formally defined machine-readable layer on top.
However, the fight occurs when XML is used as a generic serialization format, or even as syntax for a programming language. There, the features only add redundant complexity (what does unquoted text mean in an ant build file? why am I putting quotes around identifiers, and why am I including the redundant end tags when my editor indents for structure?). When your format already has specific semantics, the wiggle room between element/attribute/text only adds inflexible accidental complexity.
JSON is another contestant in the generic data serialization space, which is doing better than sexp - possibly due to the ubiquity of javascript (free-riding the web). Though, for this technology generation, the fight is over.
I think there'll be another chance when SOA/web services are superseded; but there's still huge scope for improvement within their present architecture/ecosystem (e.g. REST vs. SOAP vs. ?).
For programming languages, sexps are competing against all the other language syntaxes out there, not just XML. I agree that it seems odd that ant uses XML though... perhaps the extensibility of ant is easier with some kind of generic format (like XML/sexp/JSON)? yet other languages manage to be extensible via functions/classes/modules etc.
Thank goodness no one uses JSON to encode a language (in the way that ant uses XML).
Yeah, I know you meant that. The point is people don't do it because they don't have to. In XML, by contrast, your example has been written countless times. For a few years it was almost de rigeur.
JS's notation for data and code may not be identical, but they're close enough to get things done (that's JS's Lisp heritage showing). Since XML was explicitly designed to prevent people from doing a bunch of things they needed, it's not surprising that what resulted were montrosities.
I suspect that XML is a manifestation of the kind of people who like to lock things down and specify them up front, until they're so tied up in knots of their own making that they form a committee to design the same thing all over again. As you may guess, I'm of the opposite camp. Happily, I can work in my medium and leave them to theirs.
Lets assume that an identifier array that starts with the word "function" is a function declaration, and has the additional place arguments name, argList, and body. Now we get something like this:
Yeah, I was thinking that. You're also omitting an explicit return, I would guess by assuming the value of the last statement (as an expression) is to be returned (this won't work in general, because Javascript allows multiple returns, like C or Java). It's closer to a lisp syntax, by using nested lists instead of structs/maps. I submit that it's against the spirit of JSON to be able to name the values, and then not use that ability. I think you're Greenspunning it ;-). Sure is shorter though. :-)
(basically lisp syntax substituting JS array notation.) My point is that JSON is capable of being terse, and XML is not. XML attributes are unordered, so you have to use child nodes, which have to be named. The best you could do is:
Which is significantly longer than a positional JSON serialization. XML is also harder to implement a parser for, and existing libraries tend to be difficult to use (python etree and ruby's implementations are a much better direction). Now, someone else's raw XML is often easy to understand, whereas my array based JSON format would clearly require domain knowledge. Because of this, I prefer JSON for small network requests that are consumed by scripting languages.
For larger, disk files, the overhead of XML is marginalized, and the extra formatting might help in hand editing and error correction.
As for Greenspunning, I think its a perspective issue. The example was one of code serialization, so the lisp syntax is particularly well suited to the problem. Programmers also have the domain knowledge, so the less verbose format is still easy to understand.
I see what you're saying. But I use json to encode my latest language (release in the coming week). But its a really limited case. It would suck in the general case, but its not always terrible (I hope). Just sayin.
Its an embedded language for xml/html data extraction/scraping. It's inspired by MQL ( http://www.freebase.com/tools/queryeditor/ , look for the examples link towards the bottom), which should give you an idea of how json can be used to represent structure.
The Ruby and Python bindings let you choose between JSON and the native hash/array or list/dictionary structures. You can be idiomatic and portable at the same time.
That's very cool. JSON seems a natural conceptual match for querying data like that. I think a different concrete syntax might be more readable for me - but could just be my unfamiliarity. And JSON is an instant, no-work solution, so there's no downside. It would be good to see your project.
I don't know the ant specifics, but I'd surmise that one starts off thinking of a simple declarative format (we've got some targets, that depend on other targets, which is just a simple tree, and perhaps we can write various analysis tools later). Then of course as time goes on, it gets more general purpose language features (symbol bindings, conditionals, etc), and adopts an ever more procedural feel (especially in the case of a build system, which essentially all about mutating filesystem state). Since these features weren't designed from the start, you end up with rifts, such as between ant "attributes" and "properties", which don't really tax the ant gurus, just the casual users and newcomers.
Seems that it would be easier to write a build system as an embedded DSL in a general purpose language to begin with, and when further analysis tools wanted to be written, make the necessary changes (meanwhile old build scripts run just fine using the old library).
But then again, it's much easier to criticize with hindsight than to design a system that becomes popular.
>But then again, it's much easier to criticize with hindsight than to design a system that becomes popular.
Indeed - ant came about when XML was popular for far more things then it should have been used for, and java wouldn't make a very good declarative build language (which is what ant was targeting).
The article is refuted by both reality
-- the godawful messes that people have made out of XML, the
complexity of the systems built on top of it, its failure to
amount to more than a bloated machine format, and the exodus in
favor of more lightweight notations like JSON -- as well as by
itself: the author expects the reader to consider the
proliferation of single-shot technologies built on XML as a good
thing and not a Tower of Babel. He lists DTDs, RELAX, XML Schema,
XPointer, XQuery, XPath (edit: whoops, I snuck that one in), XSLT, and CSS and cites the "decades of
person-effort embodied in those specifications" as if it were an
argument in their favor!
I found this statement in the article interesting: "The central idea of the XML family of standards is to separate code from data." It explains why all the systems that express programming constructs in XML are such monstrosities (including XSLT, which he cites approvingly and yet which totally violates this separation). I wonder what the author would say to the people who do this kind of thing? They're not using it according to specification?
Edit: some of the arguments are out of date, too. I don't know anything about Lisp documentation in LaTeX; the open-source Lisp world tends to generate HTML documentation from s-expressions, as for example here: http://www.weitz.de/cl-ppcre/.
Not really. I'm talking about what I observe in the hacker world, which is a thoroughgoing trend away from XML. Do you really see it otherwise? It's not all going to JSON of course.
Most of the XML stuff is in big enterprise projects and, for some value of "count", those just don't count. Last I checked the IT pundits were declaring SOA dead, after having milked it for a decade.
Hmmm... I don't think hackers ever went towards XML. The old C hackers hated it (too inefficient.)
The nice thing I see in XML is that it abstracts out grammars (using XML Schema / DTD). For JSON, a grammar isn't used - it's a nested tuple transmission format - sort of a dynamic type systems, but without, er, types - just tuples that can contain anything. It's agile, and all you need in many cases. And JSON is a natural for web-client stuff.
BTW: who said SOA is dead? SOA doesn't solve any pressing problem, but all the vendors switched to it.
A friend who works in banking sent this to me, mainly because the two of us had been predicting it for years.
The post, incidentally, comes perilously close to saying that it's time to invent new bullshit acronyms because business people have stopped falling for "SOA". One could hardly ask for a better exposé of the IT racket.
Thanks. It's a bit odd, because I don't think anyone thought SOA would do that much. It's been always vendor-driven. At least it's better than CORBA, the previous monstrosity in that role. But yes, with the recession, there may be an opportunity for something better and cheaper...
with the recession, there may be an opportunity for something better and cheaper...
Maybe, but I doubt it. When it comes to big companies, there are too many people making money off software development not getting better or cheaper. There's still too much of a culture gap between the execs who write the cheques and competent hackers. This creates opportunity for the incompetent-but-slick to step in as parasites. When I say incompetent, of course, I mean incompetent at software; they're quite competent at getting execs to write cheques. And that would be fine, except they're not adding any value (or at least not any value commensurate with what's spent). In other words, the market is simply inefficient.
Even when competent hackers work for such companies they are paid far less and have far less influence than the slickees. Moreover, the population of the competent is small, so they are drowned out demographically.
It will take a long time before the market rationalizes. I do believe this is happening, but slowly. One economic cycle won't turn it around, but I agree with you that it may help!
It's a cynical story, that of course has some truth to it. I like to focus on the visionaries, who are seeking a better way (to be more profitable). Once a new way is seen and proven, others (who are less comfortable with risk) become interested. If the time is right, and enough people are convinced, there's a revolution, and a new technology is adopted.
There's a range of people in any industry - I doubt that many fit the black-and-white stereotype that you paint. There are also real advantages of standardization (such as being able to swap different items in and out), which comes at a cost of inefficiency. The interfaces to those standards must be rigid (within tolerances), or you can't reliably swap the modules. It's a trade-off.
Re your first paragraph: we're talking about the same process. I just expect it to be slow. Very few people past a certain age change their way of thinking. It's possible that some kind of disruptive effect will occur that suddenly rationalizes corporate IT. I sure hope it does. But that's a tall order: even the internet didn't do that.
Re your second paragraph: Oh, come on. The XML standard doesn't allow two different programs that use XML to interoperate or one to be substituted for the other. It was never going to allow that, and it was obvious from the beginning that it was never going to allow it. It's like saying that if you're French and I'm German and we publish books using a standard font, we'll understand each other.
Two different programs that use the same XML Schema - that write to the same interface specification. It's just an API. No, not magic pixie dust, but it helps. I think we'll get standard XML Schemas (like the per industry efforts), for specific purposes. It's not really the XML that helps, but the standardization of the interface. But even that's hard. However, there's a lot of money to be saved and agility to be gained, so this (or something like it) is inevitable. It's a worthwhile endeavour.
I was mainly addressing your comment elsewhere about people who like to lock things down, and specify them upfront. For interfaces, you really do need to agree on some things, and be strict (within those tolerances). Someone changing an interface on you can be pretty frustrating.
My comment about locking things down doesn't apply to systems where it's necessary to agree on an interface. Obviously that's sometimes necessary and good. It's not the same thing as mandating a standard-for-how-all-interfaces-shall-be-defined-and-all-grammars-declared. Interfaces that actually work, in my experience, are worked out on a point-to-point basis. For the rest one needs simplicity and malleability - that makes it possible to do what one needs.
I worked with one of these industry-specific XML formats on another project. A bunch of oil companies took years to define it. Do you know what happened? First, the overwhelming majority of projects still use the old ASCII format which is much easier to work with. Second, those of us who tried out the new format soon found that the different vendors' implementation of this "standard" were incompatible with each other, and we had to write just as much vendor-specific code as before, only now the format was bloated and rigid and made it harder.
The whole approach just hasn't worked in practice, and if it were going to, it would have by now.
By the way (I can't resist one more comment), apropos this:
The nice thing I see in XML is that it abstracts out grammars (using XML Schema / DTD)
Have you ever used XML Schema on a real project? I tried, on a nice meaty project, for perhaps a year. It turned out to be as awful to work with in practice as it sounds good in theory. It's the kind of thing people write design specs for, and then after the standards are ratified they write books about it, without ever actually themselves building anything. Meanwhile, pity the poor schmucks who get the book and try to use it on a real system, wondering what they're doing wrong for a year until they finally figure out that the problem isn't them.
To give you an example: what happens when you give one of these systems a chunk of data that doesn't match its nicely specified schema? Well, with the tools we were using at the time, you get something like "Error type error the int at position 34,1 of element XSD:typeSpec:int32 type invalid blah blah blah". What can a system do with that other than tell its poor user, "Invalid data"?
Now I suppose you'll tell me that we just picked the wrong validator. :)
I've seen this problem on several projects. I think the best fix is to use a schema-agnostic processor. With a schema-agnostic processor, you can work with content whether it validates or not. This is often handy, as it's easier to fix the invalid documents in the processing environment than outside of it. For example, inside the processing environment I can fix the problem by writing an XQuery.
Disclaimer: I work for Mark Logic, which sells a schema-agnostic XML content server.
Error messages are difficult to do right, and it's one area where (for example) DSL's tend to fall down. You might have a beautiful DSL, and think that it's finished, because you - as the designer - don't make mistakes with it (perhaps because you're really smart; really know the tool; or really haven't used it). Even some fully fledged languages have poor error reporting.
For a grammar specification language (like XML Schema) to do a really good job, it really should also formalize how to specify error messages for that particular grammar. I'm not sure how hard it would be to do this, and I haven't seen any research on it.
An odd thing about XML Schema is that it's not very resilient - when this was supposed to be one of the cool thing about "extensible" XML. The next version is a little better at this. But it sounds like in your case, you wanted to get an error (because there was a real problem), it's just that you couldn't trace where it came from, or what its meaning was in terms of the system. It sounds like a hard problem. BTW: would using JSON or sexps have made this problem any easier? I think it's much deeper than that.
Agreed about errors. A good error-handling design for system X often needs to be nearly as complex as the design of X itself, and more importantly, needs to have the same "shape" as that design; it needs to fit the problem that X solves, speak the "language" that X and the users of X speak. Typically the amount of work involved, and the importance of it, are badly underestimated. Usually people work on what they think of as the cool parts and neglect the rest. (This is the reason DSL error handling tends to suck.) Maybe they try to hack the rest in later. By then it's much harder -- you have to rework the kernel to allow for the right kind of hooks into it so your error messages can have enough meaning. The advent of exceptions, by the way, was a huge step backward in this respect. It made it easy to just toss the whole problem up the stack, metaphorically and literally!
Getting back to XML... it's unsurprising that XML Schema isn't resilient. It's one of the most rigid technologies I've ever seen. Rigidity gets brittle as complexity grows. The error-handling fiasco of XML Schema isn't an accident. It's revealing of a core problem. Don't think you can sidestep the issue just by saying, well it's hard. :)
Would using JSON or sexps have made this problem any easier?
Sure. I've done it both ways, there's no comparison. It's not the data format alone that makes the difference, but the programming and thinking style that the format enables. These structures are malleable where XML is not. Malleability allows you to use a structure in related-but-different ways, which is what error handling requires (a base behavior and an error-handling one). It also makes it far easier to develop these things incrementally instead of having to design them up front. So this is far from the only problem they make easier.
With malleability, I think you're talking about low-level control, where you work directly in terms of the data structures that will be serialized as JSON. You might be translating between the domain data structures and the JSON structure; or they might appear direction as JSON. This is malleable in that you tweak it however you want; and it's simple in that you have direct access to everything. You can do validation in the same way. If something goes wrong, you have all the information available to deal with it as you see fit.
The wire format doesn't affect this approach - it could be JSON or XML. However, JSON and data structures maps cleanly, because it's an object format already. To do the same thing with XML requires an extra level, and you get a meta-format like xmlrpc, which is pretty ugly.
So I think you're talking about a kind of object serialization, with object-to-object data binding.
XML Schema is an attempt to factor out the grammar of the data structures, so that they can be checked automatically, and other grammar-based tasks can be automated. I think this is a worthy quest, succeed or fail. One specific failing we discussed was error messages.
I'm trying to grasp your point of view, and presenting what I think it is, so you tell me if I got it right or not (assuming you see this reply).
Incidentally, I was just parsing some XML Schema documents, and the error messages were more helpful than I expected - it gave the rule of the grammar that was causing problems. However, this rule looked like it was taken from the English specification of XML Schema, when it could be (and should be) automatically inferred from the machine readable version of the grammar (i.e. the XML Schema for XML Schema documents).
Interesting, thanks. Would you say this "malleability" issue would be addressed if the XML could be bound (databinding) to arbitrarily different object structures?
BTW: I meant that specific problem you mentioned (which was a non-conforming xml document) - how would JSON/sexps make that specific one easier to solve?
From the article: "Nor is it an accident of history that Lisp programmers never came up with these technologies for Lisp data. The central idea of the XML family of standards is to separate code from data. The cental idea of Lisp is that code and data are the same and should be represented the same."
No, that's not the central idea of lisp. The closest "central idea of lisp" is that code should have a standard and convenient representation so it can be readily manipulated by programs.
Lispers separate application code from application data. They do so even when the application data is a program....
It is true that Lisper's didn't engage in years of meta-whinging and defining transform languages, but the fact that the XML folk did was more of an accident of history. When the Semantic Web hype was in full-swing, the Lispers were still licking their wounds from the AI winter.
And, how is that Semantic Web coming along? Pretty much where the Lisper's left it....
More seriously, yes, binary formats have their problems. And ASN.1 in particular suffered/suffers from being pre-unicode (and thus having a number of different + mostly useless character set types).
But it seems to me that this nested tag-length-value structure (ASN.1 and protocol buffers) occupies a design sweet spot.
He makes some good points that I didn't consider when first I hated upon XML. I agree that XML is better for its problem domain than s-expressions, but in that case the whole article is an apples to oranges comparison.
He claims that syntax is important, otherwise we'd still be using binary formats. If that's true then the only reason syntax is important is for humans, because machines can read binary formats just as well as anything else. So why compare it to s-expressions instead of to a human-friendly ideal? Is XML the best human-friendly markup language possible? Yes his paragraph is similarly readable in both forms, but would it be if wrapped in XML namespace clauses and an XML Schema? Would the s-expression version be if wrapped in a macro to parse text outside quotes as significant text?
The purpose of XML shouldn't be to make everything more XML-y, but to make everything easier. Maybe it does, in big enterprisey systems and between large data silos. Look at the recent ODF v. Office XML file formats. Have any of you been tempted to process Microsoft Office documents directly because now they're XML they must be easier than the old binary formats?
It's still not human friendly - you can't use the Jabber protocol by hand over telnet like you can use POP3, SMTP, IRC.
> you can't use the Jabber protocol by hand over telnet like you can use POP3, SMTP, IRC.
There's an intriguing view that text can't be used by hand either - you have to use a "text shell". Wouldn't a fair comparison use an "XML shell"? If it knew the schema, it could even autocomplete/intellisense for you, showing you the available choices at that point... it's pretty cool how XML has factored out the grammar of a language, in a reusable way.
I don't know the Jabber protocol, but the difficulty I've experienced with web-based XML protocols (web services) is that the http header needs the length of the message, which is hard to do by hand - it's not due to XML (and a "http shell" would fix that...)
I have found using a lib to process MS document formats from binary is easier then direct XML. The XML formats look more like a memory dump then a marked up document.
Well, that's OOXML. MS chose to employ the pure content model, so that text nodes and elements cannot be mixed as children of a root element. That leads to monstrosities like this (I forgot the actual element names, so I made them up):
<p>
<t>This is not bold.</t>
<t><b>This is.</b></t>
<t>This isn't.</t>
</p>
In a mixed content model schema, you could omit the extraneous <t> elements and just directly embed text nodes:
<p>
This is isn't bold.
<b>This is.</b>
This isn't either.
</p>
That's just one of the many ways in which MS misused XML to make things worse instead of better.
Actually yes, now you mention it it was Open Office XML. In any case you still need a library to help you out, its really not that easy to go straight to the XML (at least for the casual user - I guess for implementers its another matter).
I think when arguing about XML, many people forget its relationship to SGML. Many of the drawbacks in XML are design decisions made in order to maintain backwards compatibility with SGML tools that were in use when XML was devised.
In order to simplify parsing (and in order to enable parsing a well formed document without knowing its DTD), XML threw out much of the syntactic sugar that SGML had to offer such as implicitly closed tags. I.e. Many people think that <tr><td>foo<td>bar... is invalid HTML because of missing </td>. In fact, it is perfectly valid because the HTML (SGML-)DTD specifies that a <td>-tag implicitly closes a preceding <td>. Most think it's a browser-hack. It is not.
What both XML (and SGML) add over S-EXPR is support for expressing grammers, a well-defined validation mechanism and quite good support for coping with different character encoding.
I'm surprised no-one mentioned SXML, Kiselyov's representation of XML in Scheme. His paper, "A better XML parser through functional programming", can be found here: http://okmij.org/ftp/papers/XML-parsing.ps.gz.
The introduction covers the difficulty of parsing XML fairly well, which is another way of saying that XML does have a fairly complex "model" behind it that Lispers often ignore.
"They are not identical. The aspects you are willing to ignore are more important than the aspects you are willing to accept. Robbery is not just another way of making a living, rape is not just another way of satisfying basic human needs, torture is not just another way of interrogation. And XML is not just another way of writing S-exps. There are some things in life that you do not do if you want to be a moral being and feel proud of what you have accomplished."
Wasting what might amount to thousands of man-years by forcing talented people (who might otherwise accomplish something meaningful) to build workarounds for an inferior technology is every bit as criminal as, say, embezzlement.
I have a corollary to Greenspun's Tenth Law, which is that:
Lisp programmers see everything in terms of as an ad hoc,
informally-specified, bug-ridden, slow implementation of half of Lisp,
and don't see other benefits it might have.
That is, Greenspun's Tenth Law is true - for Lisp programmers.
I came to this conclusion because of a tragic pair of research papers, which had a fantastic usability idea. The second half of the first paper took the focus off the usability, and developed it into a very simple functional language. In their next paper, they dropped the fantastic usability idea completely, and made it into a lisp. :-(
Some XML standards fell into a similar trap, by wanting languages that process XML to be themselves written in XML - such as XSLT. It's a nice abstract concept to be able to process yourself... but at the price of abominations like "i < 10".
Adam Bosworth pointed out that XML's XPath resisted this - by making XPath itself an embedded non-XML mini-language. Imagine an XML representation of path components - now that would be verbose!
In a politically expedient move, I'd like to point out that pg didn't fall into this trap: he made the DSL for users of Viaweb to customize their store to not be lisp (though an easily mapped subset, if I understand correctly.) It's a non-lisp mini-language.
Regarding XML: I'd always thought it was just one of many possible syntaxes for representing hierarchical data; and it really didn't matter which syntax you used. As in a lingua franca (or any standard), provided that it is barely adequate, the key thing is that everyone agrees on it. XML became the Chosen, de facto standard, because everyone was already familiar with HTML, propelled by the mass adoption of the web. So the question becomes: why did we get HTML (based on SGML), instead of S-expressions? The article gives reasons, but I guess the short of it is that if a group of people work towards a specific purpose for years, and are successful at it (as SGML was), it is probably a good base to start from if you want to do something similar, i.e. describe documents.
Also, more directly, if I imagine a large webpage described with S-expressions, I think HTML is a bit clearer.
Nitpick: The article omits that quotes (or apostrophes) must be escaped in XML attributes.
Very telling points about LaTeX - that like XML/HTML, it also uses named end-tags; and that Lisp documentation itself is used LaTeX instead of S-expressions - drinking their kool-aid; but not eating their dog-food.
> Very telling points about LaTeX - that like XML/HTML
No, it's a demonstration of ignorance. LaTeX wasn't written by Lispers, it is merely used by them. The fact that they find its design decisions acceptable must be weighed against the cost of their alternatives. That doesn't imply that they wouldn't have been happier with a more lispish syntax.
At the time that those decisions were made, LaTeX was pretty much the best alternative. The fact that Lispers, like almost everyone else in related communities, made that decision merely says that Lispers don't cut off their noses to spite their face.
The article suggests Lispers could have used sexp as a front-end to LaTeX, in the same way that XML was used as a front-end to LaTeX. Very easy to do.
> If S-expressions were easier to edit, it would be most logical to edit the document in S-expressions and then write a small Scheme program to convert S-expressions into a formatting language like LaTeX. This is, what XML and SGML people have done for decades [...]
> The article suggests Lispers could have used sexp as a front-end to LaTeX
(1) As another comment points out, they have when doing so provided benefits.
(2) Lispers tend to be multi-lingual; they'll use other languages when appropriate. If XMLers can only work in XML....
>This is, what XML and SGML people have done for decades
Decades? 20 years/two decades ago is 1988. The first draft of XML is roughly 1998/10 years later/one decade ago. GML, a predecessor to SGML, didn't become public until 73 but the "multiple use" stuff was still in the future.
SGML rode the WWW wave, but that didn't happen for technical reasons.
Note that such front-ends are inherently leaky. If all they do is transform syntax, they're probably a bad idea.
Note that the point of using s-expressions as a front-end would be programmatic generation, not by-human editing. (Neither sexpressions nor xml is actually all that friendly for editing text.)
Do XML folks really think write front-ends for ease of editing?
However, the fight occurs when XML is used as a generic serialization format, or even as syntax for a programming language. There, the features only add redundant complexity (what does unquoted text mean in an ant build file? why am I putting quotes around identifiers, and why am I including the redundant end tags when my editor indents for structure?). When your format already has specific semantics, the wiggle room between element/attribute/text only adds inflexible accidental complexity.