Hacker News new | past | comments | ask | show | jobs | submit login

The article is refuted by both reality -- the godawful messes that people have made out of XML, the complexity of the systems built on top of it, its failure to amount to more than a bloated machine format, and the exodus in favor of more lightweight notations like JSON -- as well as by itself: the author expects the reader to consider the proliferation of single-shot technologies built on XML as a good thing and not a Tower of Babel. He lists DTDs, RELAX, XML Schema, XPointer, XQuery, XPath (edit: whoops, I snuck that one in), XSLT, and CSS and cites the "decades of person-effort embodied in those specifications" as if it were an argument in their favor!

I found this statement in the article interesting: "The central idea of the XML family of standards is to separate code from data." It explains why all the systems that express programming constructs in XML are such monstrosities (including XSLT, which he cites approvingly and yet which totally violates this separation). I wonder what the author would say to the people who do this kind of thing? They're not using it according to specification?

Edit: some of the arguments are out of date, too. I don't know anything about Lisp documentation in LaTeX; the open-source Lisp world tends to generate HTML documentation from s-expressions, as for example here: http://www.weitz.de/cl-ppcre/.




Do you have references for that exodus?

By Google, XML is winning 50 to 1 - but declining, and JSON is growing: http://www.google.com/trends?q=xml%2C+json&ctab=0

However, a factor is that people already know about XML and don't need to search for it. e.g. HTML is declining even faster: http://www.google.com/trends?q=xml%2C+json%2C+HTML&ctab=...


Ooh, duelling URLs, can I play? :)

http://www.google.com/trends?q=xml%2C+javascript

Do you have references for that exodus?

Not really. I'm talking about what I observe in the hacker world, which is a thoroughgoing trend away from XML. Do you really see it otherwise? It's not all going to JSON of course.

Most of the XML stuff is in big enterprise projects and, for some value of "count", those just don't count. Last I checked the IT pundits were declaring SOA dead, after having milked it for a decade.


Hmmm... I don't think hackers ever went towards XML. The old C hackers hated it (too inefficient.)

The nice thing I see in XML is that it abstracts out grammars (using XML Schema / DTD). For JSON, a grammar isn't used - it's a nested tuple transmission format - sort of a dynamic type systems, but without, er, types - just tuples that can contain anything. It's agile, and all you need in many cases. And JSON is a natural for web-client stuff.

BTW: who said SOA is dead? SOA doesn't solve any pressing problem, but all the vendors switched to it.


BTW: who said SOA is dead?

http://apsblog.burtongroup.com/2009/01/soa-is-dead-long-live...

A friend who works in banking sent this to me, mainly because the two of us had been predicting it for years.

The post, incidentally, comes perilously close to saying that it's time to invent new bullshit acronyms because business people have stopped falling for "SOA". One could hardly ask for a better exposé of the IT racket.


Thanks. It's a bit odd, because I don't think anyone thought SOA would do that much. It's been always vendor-driven. At least it's better than CORBA, the previous monstrosity in that role. But yes, with the recession, there may be an opportunity for something better and cheaper...


with the recession, there may be an opportunity for something better and cheaper...

Maybe, but I doubt it. When it comes to big companies, there are too many people making money off software development not getting better or cheaper. There's still too much of a culture gap between the execs who write the cheques and competent hackers. This creates opportunity for the incompetent-but-slick to step in as parasites. When I say incompetent, of course, I mean incompetent at software; they're quite competent at getting execs to write cheques. And that would be fine, except they're not adding any value (or at least not any value commensurate with what's spent). In other words, the market is simply inefficient.

Even when competent hackers work for such companies they are paid far less and have far less influence than the slickees. Moreover, the population of the competent is small, so they are drowned out demographically.

It will take a long time before the market rationalizes. I do believe this is happening, but slowly. One economic cycle won't turn it around, but I agree with you that it may help!


It's a cynical story, that of course has some truth to it. I like to focus on the visionaries, who are seeking a better way (to be more profitable). Once a new way is seen and proven, others (who are less comfortable with risk) become interested. If the time is right, and enough people are convinced, there's a revolution, and a new technology is adopted.

There's a range of people in any industry - I doubt that many fit the black-and-white stereotype that you paint. There are also real advantages of standardization (such as being able to swap different items in and out), which comes at a cost of inefficiency. The interfaces to those standards must be rigid (within tolerances), or you can't reliably swap the modules. It's a trade-off.


Re your first paragraph: we're talking about the same process. I just expect it to be slow. Very few people past a certain age change their way of thinking. It's possible that some kind of disruptive effect will occur that suddenly rationalizes corporate IT. I sure hope it does. But that's a tall order: even the internet didn't do that.

Re your second paragraph: Oh, come on. The XML standard doesn't allow two different programs that use XML to interoperate or one to be substituted for the other. It was never going to allow that, and it was obvious from the beginning that it was never going to allow it. It's like saying that if you're French and I'm German and we publish books using a standard font, we'll understand each other.


Two different programs that use the same XML Schema - that write to the same interface specification. It's just an API. No, not magic pixie dust, but it helps. I think we'll get standard XML Schemas (like the per industry efforts), for specific purposes. It's not really the XML that helps, but the standardization of the interface. But even that's hard. However, there's a lot of money to be saved and agility to be gained, so this (or something like it) is inevitable. It's a worthwhile endeavour.

I was mainly addressing your comment elsewhere about people who like to lock things down, and specify them upfront. For interfaces, you really do need to agree on some things, and be strict (within those tolerances). Someone changing an interface on you can be pretty frustrating.


My comment about locking things down doesn't apply to systems where it's necessary to agree on an interface. Obviously that's sometimes necessary and good. It's not the same thing as mandating a standard-for-how-all-interfaces-shall-be-defined-and-all-grammars-declared. Interfaces that actually work, in my experience, are worked out on a point-to-point basis. For the rest one needs simplicity and malleability - that makes it possible to do what one needs.

I worked with one of these industry-specific XML formats on another project. A bunch of oil companies took years to define it. Do you know what happened? First, the overwhelming majority of projects still use the old ASCII format which is much easier to work with. Second, those of us who tried out the new format soon found that the different vendors' implementation of this "standard" were incompatible with each other, and we had to write just as much vendor-specific code as before, only now the format was bloated and rigid and made it harder.

The whole approach just hasn't worked in practice, and if it were going to, it would have by now.


By the way (I can't resist one more comment), apropos this:

The nice thing I see in XML is that it abstracts out grammars (using XML Schema / DTD)

Have you ever used XML Schema on a real project? I tried, on a nice meaty project, for perhaps a year. It turned out to be as awful to work with in practice as it sounds good in theory. It's the kind of thing people write design specs for, and then after the standards are ratified they write books about it, without ever actually themselves building anything. Meanwhile, pity the poor schmucks who get the book and try to use it on a real system, wondering what they're doing wrong for a year until they finally figure out that the problem isn't them.

To give you an example: what happens when you give one of these systems a chunk of data that doesn't match its nicely specified schema? Well, with the tools we were using at the time, you get something like "Error type error the int at position 34,1 of element XSD:typeSpec:int32 type invalid blah blah blah". What can a system do with that other than tell its poor user, "Invalid data"?

Now I suppose you'll tell me that we just picked the wrong validator. :)


I've seen this problem on several projects. I think the best fix is to use a schema-agnostic processor. With a schema-agnostic processor, you can work with content whether it validates or not. This is often handy, as it's easier to fix the invalid documents in the processing environment than outside of it. For example, inside the processing environment I can fix the problem by writing an XQuery.

Disclaimer: I work for Mark Logic, which sells a schema-agnostic XML content server.


Error messages are difficult to do right, and it's one area where (for example) DSL's tend to fall down. You might have a beautiful DSL, and think that it's finished, because you - as the designer - don't make mistakes with it (perhaps because you're really smart; really know the tool; or really haven't used it). Even some fully fledged languages have poor error reporting.

For a grammar specification language (like XML Schema) to do a really good job, it really should also formalize how to specify error messages for that particular grammar. I'm not sure how hard it would be to do this, and I haven't seen any research on it.

An odd thing about XML Schema is that it's not very resilient - when this was supposed to be one of the cool thing about "extensible" XML. The next version is a little better at this. But it sounds like in your case, you wanted to get an error (because there was a real problem), it's just that you couldn't trace where it came from, or what its meaning was in terms of the system. It sounds like a hard problem. BTW: would using JSON or sexps have made this problem any easier? I think it's much deeper than that.


Agreed about errors. A good error-handling design for system X often needs to be nearly as complex as the design of X itself, and more importantly, needs to have the same "shape" as that design; it needs to fit the problem that X solves, speak the "language" that X and the users of X speak. Typically the amount of work involved, and the importance of it, are badly underestimated. Usually people work on what they think of as the cool parts and neglect the rest. (This is the reason DSL error handling tends to suck.) Maybe they try to hack the rest in later. By then it's much harder -- you have to rework the kernel to allow for the right kind of hooks into it so your error messages can have enough meaning. The advent of exceptions, by the way, was a huge step backward in this respect. It made it easy to just toss the whole problem up the stack, metaphorically and literally!

Getting back to XML... it's unsurprising that XML Schema isn't resilient. It's one of the most rigid technologies I've ever seen. Rigidity gets brittle as complexity grows. The error-handling fiasco of XML Schema isn't an accident. It's revealing of a core problem. Don't think you can sidestep the issue just by saying, well it's hard. :)

Would using JSON or sexps have made this problem any easier?

Sure. I've done it both ways, there's no comparison. It's not the data format alone that makes the difference, but the programming and thinking style that the format enables. These structures are malleable where XML is not. Malleability allows you to use a structure in related-but-different ways, which is what error handling requires (a base behavior and an error-handling one). It also makes it far easier to develop these things incrementally instead of having to design them up front. So this is far from the only problem they make easier.


With malleability, I think you're talking about low-level control, where you work directly in terms of the data structures that will be serialized as JSON. You might be translating between the domain data structures and the JSON structure; or they might appear direction as JSON. This is malleable in that you tweak it however you want; and it's simple in that you have direct access to everything. You can do validation in the same way. If something goes wrong, you have all the information available to deal with it as you see fit.

The wire format doesn't affect this approach - it could be JSON or XML. However, JSON and data structures maps cleanly, because it's an object format already. To do the same thing with XML requires an extra level, and you get a meta-format like xmlrpc, which is pretty ugly.

So I think you're talking about a kind of object serialization, with object-to-object data binding.

XML Schema is an attempt to factor out the grammar of the data structures, so that they can be checked automatically, and other grammar-based tasks can be automated. I think this is a worthy quest, succeed or fail. One specific failing we discussed was error messages.

I'm trying to grasp your point of view, and presenting what I think it is, so you tell me if I got it right or not (assuming you see this reply).


Incidentally, I was just parsing some XML Schema documents, and the error messages were more helpful than I expected - it gave the rule of the grammar that was causing problems. However, this rule looked like it was taken from the English specification of XML Schema, when it could be (and should be) automatically inferred from the machine readable version of the grammar (i.e. the XML Schema for XML Schema documents).


Interesting, thanks. Would you say this "malleability" issue would be addressed if the XML could be bound (databinding) to arbitrarily different object structures?

BTW: I meant that specific problem you mentioned (which was a non-conforming xml document) - how would JSON/sexps make that specific one easier to solve?


I recommend looking at Relax NG or Schematron.


Re: Lisp documentation in LaTeX, the author may be referring, for example, to CLtL; see http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node3.html


Ah. Well, that was a book. I'll bet Steele would prefer LaTeX over XML too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: