Hacker News new | past | comments | ask | show | jobs | submit login

XML files cannot be easily processed with standard UNIX tools like grep, sed, and AWK. XML requires specialized libraries and tools to process correctly, making it an extremely poor choice for... well, just about anything. It's a markup format for text, not a programming language.

Building software is a programmatic process. No XML please! We're decidedly not on Windows, and since I have the misfortune of fitting such square pegs into round holes, please don't use XML for applications which must run on UNIX. It's a nightmare. It's horrible. No!!!




There is no particular relationship between Windows and XML. And just to play devil's advocate, is the lack of XML support in grep, sed, and awk a problem with the data format or with the tools? Why can't we have new standard tools that operate on hierarchical formats such as XML / JSON / YAML? Current standard Unix tools have plenty of flaws and as forward thinking developers we shouldn't be afraid to replace them with something better.


I have noticed a particular relationship between Windows, Java, and XML: all Java programmers nowadays seem to come from Windows (and then I end up with ^M CR characters in all the text files, even shell scripts!), use Java, and write configuration in XML.

YAML doesn't need any special tools - it's ASCII and can easily be processed with AWK, for example.

I don't know about you, but the last thing I want is to have to have a whole new set of specialized tools, just so somebody could masturbate in XML and JSON.

XML is a markup language. That means it's for documents, possibly for documents with pictures, perhaps even with audio. It's not and never was meant for storing configuration or data inside of it. XML is designed to be used in tandem with XSLT, and XSLT's purpose is to transform the source XML document into (multiple) target(s): ASCII, ISO 9660, audio, image, PDF, HTML, whatever one writes as the transformation rules in the XSLT file. XML was never meant to be used standalone.

If you really want to put the configuration into an XML file, fine, but then write an XSLT stylesheet which generates a plain ASCII .cf or .conf file, so its processing and parsing can be simple afterwards. XML goes against the core UNIX tenet: keep it simple.

Do you like complex things? I do not, and life is too short.


If you must have structured data, use a lisp program. Congratulations on using a format that was designed to be executable. and if it's a build tool, you better believe it's executable. I suspect that Annatar is a Murray Hill purist (I don't know for sure), so he may disagree with me.

Of course, like any real programming language, it's hard to process with regex, but then again, I don't want to process makefiles with regex. And you might have some luck coaxing AWK or the SNOBOL family to parse it, and it would be far easier than doing the same with XML.

>please don't use XML for applications which must run on UNIX. It's a nightmare. It's horrible. No!!!

I'd disagree with you there. DocBook, HTML, and friends, are all good applications of XML (or near XML), doing what XML was designed for: Document Markup.

Seriously people, when you're writing a program in a language that has "Markup Language" in the name, does that not ring any alarm bells?


Are you seriously suggesting that you can awk a makefile and get anything useful out?


Why would I need to AWK a Makefile, when make will take macro definitions as arguments on the command line?


You were the one complaining that you couldn't awk an xml file in the context of "xml versus makefile".


No, I wrote that XML for use in applications is bad, as it cannot be easily processed with standard UNIX tools. And it's most definitely bad for building software, as it is limited by what the programmer of the build software thought should be supported. A really good example of that is ANT/NANT. make, on the other hand, doesn't limit one to what the author(s) thought should be supported. Need to run programs in order to get from A to B? No problem, put whatever you need in, and have it build you the desired target.


Yes. Don't use XML as an exchange format. Use JSON or DSV instead.

Yes, I said JSON. JSON is very easy to parse, and you can grab unique key/values, which are most of them, with this regex:

  /(,|\{)\w*\"<key>\"\w*:\w*(.*?)\w*(,|\})/


PCRE. So now you have to use Perl? And what happens when your single JSON record spans multiple lines, and has recursive structures?


First off, I simply used some of PCRE for the syntax, as it's what I'm familiar with. \w could be easily replaced, and non-greedy matching is a relatively common extension.

As for when your record spans multiple lines, with recursive structures, the previous regex is for extracting simple atomic data from a json file, which is usally what you want in these cases anyway. If not, the json(1) utility can, I believe, extract arbitrary fields, and composes well with awk, grep, etc.


Yes, the json utility can process a JSON file into key:value pairs. Now ask yourself: if you end up with key:value pairs on stdout, why couldn't that have been the format in the first place? Why artificially impose not one, but two layers of complications (one as JSON, the other as the specialized json tool to process JSON)? Why not just keep it simple and go directly to a flat file ASCII format to begin with?


Well, it means not rolling your own parser. But that's not hard. The real advantage is when you actually ARE dealing with structured data, with nested objects. Most standard UNIX formats are bad at this, and sometimes you find it necessary.

Also, because JSON is so common, you get really good tooling for handling structured data by defult, instead of kinda-okay tooling for 50 different slightly-incompatable formats. 10 operations on 10 datastructures vs 100 operations on 1, and all that.

But for unstructured data, or for one-level key/value data, JSON is overkill. You can use DSV, like this:

  key1:value1
  key2:value2
  and so:on




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: