More fundamentally, JSON embodies some fundamental metaphors we use in data types - numbers, strings, lists, and maps. Those are a lot more intuitive/familiar than whatever metaphors XML comes up with.
Language designers take note: there is tremendous utility in making entities in your language isomorphic to entities expressed in previous languages.
I am actually using such isomorphisms in my current porting project. I'm literally getting a couple of orders of magnitude more productivity out of this over porting by hand.
I recently wrote a program to do Threefish encryption which I deliberately wrote to look as much like the mathematical notation in the spec as possible. That made things so much easier that it was almost ridiculous, even if it made the resulting program very peculiar-looking.
You'll be especially enamored of their 200-line TCP/IP implementation, introduced on p17 and reproduced with documentation starting on p44. It's implemented as a grammar in their meta-language that parses the ASCII-art diagrams in the RFCs and executes them.
Observing the industry over the last couple of decades, I'm left with the feeling that sometime in the last ten years or so we all went XML crazy. The popularity of JSON is just the pendulum starting to swing the other way.
So count me in. I'll use JSON over XML for server-webclient data exchange whenever I can. It's just much easier.
The funny thing is, we all knew we were going XML crazy while it was happening. I can remember anti-XML bloat articles going way back. The thing with XML was, it DID solve some problems, it just tried too hard.
The good thing about XML was it was a standard interchange format that every language had at least a couple parsers for. That said, I'm not sure if it was worth the annoyance and irritation of using stuff like XML Schema, DTDs, SOAP, or XSLT.
SCNR is an old usenet acronym and stands for "sorry. could not resist". I was a) being sarcastic and b) just making fun for the list crowd. I totally agree that if the parens issue is a non-issue for you and you feel comfortable and productive writing lisp code than go ahead.
In fact, I have the utmost respect for people able to wrap their heads around that syntax.
If you really think "wrapping your head around" the almost non-existent syntax of Lisp is a reason to respect someone, then I suggest you might delve a little further in your CS/programming studies and find many more wonderful and interesting things.
For the true rockstar coder "wrapping your head around" any reasonable syntax should be a triviality.
Maybe it goes back to my frustrations with Dreamweaver, but I always felt like an IDE only served two purposes - to prevent developers from really learning a language (autocompletes, intellisense, etc) and making up for poorly designed, overly-verbose syntaxes.
Here we are in 2010 and most programming languages still contain tons of semicolons, parans, and curly braces, and the only reasons I can find for this is:
* Language y looks like langage x because everyone's comfortable with language x's syntax already
which is contradicted by
* Having trouble with curlybraces and semicolons? Your IDE will do that for you.
We need IDEs to deal with ugly syntax, and we have ugly syntax cos we've always had ugly syntax, but we then use IDEs to write the ugly syntax.
And then I hear people deride VB, Ruby, and Python for it's lack of curly braces and parens.
So, I don't think IDEs are the answer. I think better syntaxes are the answer. For example, I used Textmate's easy HTML completion shortcuts for a long time. Then I found HAML, and I didn't need to close tags anymore. I use Sass, and now my CSS doesn't need ; or {} anymore. Same with CoffeeScript.
C#, Java, Lisp, Perl, JavaScript are all great powerful languages, but I just get really uncomfortable hearing people defend the syntax with "just use your IDE". We're programmers. How come we're not interested in saying "hey, let's fix these things!" If the argument is "IDEs make you more productive in language X" then doesn't that say something about language x?
Well, that's quite an extrapolation from my recommendation to use automation to match brackets/parens.
Python is almost there in terms of language simplicity, but I'd much rather count parens than count spaces/tabs. The former are visible. The latter are only visible through secondary effect.
Any tool can be abused. Not using automation for truly mindless tasks like counting parens is often just a knee-jerk reaction that precludes real thinking about the cost/risk/benefit of the tool.
Programming has always been full of idiots who turn off their brains. Maybe modern tools make this a bit too easy. I'd posit that one way to detect programmers with such proclivities is to discuss tools with them, and see if they give you back unsupported prejudice or reasoned analysis. (Really be wary of the ones who try to pawn off prejudice as reasoned analysis!)
For some reason, every time I let an editor automatically insert a closing parenthesis, quotation, or brace, I almost always end up with extras at the end, which is just as frustrating, if not harder to debug, as not enough. Maybe it's the way I tend to backstep as I think my way through the code flow, but it never ceases to happen whenever I step foot into Eclipse or Visual Studio...
For some reason, every time I let an editor automatically insert a closing parenthesis, quotation, or brace, I almost always end up with extras at the end
User issue here. I never have the problem you refer to. Don't have the editor/IDE insert the parens/brackets. Have it check them for you. Emacs can do a momentary highlight of the opening paren/bracket whenever you type the close of the pair. I used to use that to make sure I'm writing exactly what I thought I was writing. If my code is too convoluted for me to know that I'm correct in a split second when seeing the open paren highlight, then I know it's time for me to refactor/rewrite the method.
In my present project, I use a similar facility in the Smalltalk browser.
"Don't have the editor/IDE insert the parens/brackets. Have it check them for you. Emacs can do a momentary highlight of the opening paren/bracket whenever you type the close of the pair."
NO. Sorry but you're doing it wrong. Relying on the editor to highlight parentheses is a highly inefficient and error-prone way to work. It's distracting, it wastes mental cycles and inhibits flow.
The correct way is to configure emacs so that when you press a certain key (in my case, right shift because I configured my keyboard so I can activate left shift with my thumb), emacs inserts "()" and puts the cursor in the parentheses. No need to remember or count closing parentheses this way.
It's been a really long while since I last counted parentheses for ANYTHING. Like, let's say I write a let with one variable, and the value is a really deeply nested expression and now I'm done with it and I want to write the body of the let. What do I do? I DON'T COUNT PARENTHESES. I go back to the opening parenthese for the variable declarations of the let, then I skip over that form (the variable declarations) with C-M-f (forward-sexp), right to the body of the let. Then I use C-j (newline-and-indent) and then I can write the body of the let. (if the value form is really long I might skip a few parentheses and use C-M-b to get back at or near the opening parenthese of the variable declarations).
So, executive summary, you walk the structure but you NEVER count parentheses. Like, once every four months I might screw up and delete a parenthese and then the editor and me are both pretty confused, but it's such a total non-issue. And even then, I don't count parentheses a lot because I try to walk the structure by skipping over forms and see what works and what doesn't, and I might tell emacs to reindent some of the code and I immediately see what's wrong. With a proper structure editor, unbalancing the parentheses couldn't even happen.
NO. Sorry but you're doing it wrong. Relying on the editor to highlight parentheses is a highly inefficient and error-prone way to work. It's distracting, it wastes mental cycles and inhibits flow.
The correct way is to configure emacs so that when you press a certain key (in my case, right shift because I configured my keyboard so I can activate left shift with my thumb), emacs inserts "()" and puts the cursor in the parentheses. No need to remember or count closing parentheses this way.
No, sorry you are doing it "wrong!" :) I also use the insert () technique. I basically use whichever technique is optimal in context. Please note that your assertions about flow and distractions are highly subjective.
Yeah, sure. S-expresssions are literals in Lisp, and can be "parsed" calling eval. But I think that misses the larger point made by the OP. JSON is really useful even in languages other than Javascript. JSON wins over XML because XML is too complex and thus ambiguous. But JSON wins over s-expressions too, because s-expressions are too simple. How do you represent John Smith as an s-expression? Sure, it can be done. But as with XML, software that consumes s-expressions has to know how to interpret their structure, even if it doesn't have to parse the surface syntax.
It's true that s-expressions don't give you a literal syntax for hashmaps or arrays, but note that some lisps (e.g. Clojure) do support these natively … and in a syntax that's better than JSON since you don't have to type all those annoying commas ;-)
You know I never quite got that comma thing. For instance in python, the only reason it is necessary is because {"foo": "bar" "baz", 'k2':'v2'} is the same as {'foo':'barbaz', 'k2':'v2'}. Getting rid of auto-concatination allows for {'foo':'barbaz' 'k2':'v2'} which is pretty easy to parse base on tokens and ':'. It also eliminates a pretty common bug, in which there is kv pair per line, and the line ends with ',', except for the last line which does not have the ','. (The following line has a '}'). The common bug being adding a new kv pair and missing the ','.
Python lets you put the commas before the items, like so:
{ 'foo':'barbaz'
, 'k2':'v2'
}
Ruby doesn't like that though. Both Python and Ruby will ignore an extra trailing comma on the last item, which gives you another way of avoiding the common bug you describe.
> Yeah, sure. S-expresssions are literals in Lisp, and can be "parsed" calling eval.
No on both counts.
(1) S-expressions aren't literals.
(2) s-expressions are not "parsed" by calling lisp eval. Lisp eval's argument is an s-expression - it doesn't parse anything. (It also doesn't read anything.) Lisp read turns character sequences into s-expressions.
> But JSON wins over s-expressions too, because s-expressions are too simple. How do you represent John Smith as an s-expression?
If you'd like, exactly the same way you'd represent it in JSON, because lisp's read handles a superset of the JSON datatypes.
> software that consumes s-expressions has to know how to interpret their structure
As does JSON.
Take your example John Smith. What JSON datatype do you expect to get? (JSON just has numbers, strings, arrays, booleans, and hashes, where the keys have a restricted format.)
Not enough data types in CL or Scheme's reader (Clojure wins here) and no standard between the lisps.
But, there was something Lisp did right that no other mainstream scripting language has - it separated the reader and the evaluator. You can eval Perl or Ruby or Python, but not securely read it. This goes for Javascript too - you can eval a JSON, but you'd be an idiot.
Huh? CL's reader handles structs, hashes, and vectors in addition to strings, lists, atoms (with packages), lots of number formats (including Roman), and the ability to express AGs, not just trees (yes, cyclic too). Plus some other things that I forget. (Arrays?)
> > It's probably bad you can express cyclic graphs. That means you'll have to watch for denial-of-service attacks phrased as cyclic data.
You can turn it off. However, if cyclic graphs (or general DAGs) are important, supporting them means that you don't have to roll your own.
> Programmable reader isn't the point. That gets you back into defining your own syntax. The point is to have a standard.
The writer and reader have to agree no matter what you do. A programmable reader means that you don't have to roll your own in more cases. And, it makes it easier to test the third-party writers. (The reader folks can just publish the read-table.)
The "XML requires you to build your own parse tree" argument isn't valid; XML libraries are fully capable of handing you a DOM-style tree, and of allowing you to pull things out of the tree without writing your own traversal code.
JSON just assumes messages are going to trivially fit into naive data structures, and so provides fewer options.
I'm not sure if you are complaining about this aspect, but I would observe that "fewer options" is actually the feature here, not the bug.
A generic XML DOM is still complicated to deal with. Even if you do the "right thing" and use XPath, you still have to deal with XPath because you can't get around the fact that you have an underlying representation that has at least two dimensions (attributes vs. CDATA). That is, just as the article says, you have more degrees of freedom in how you represent your data, and what is a "degree of freedom" but a near synonym of "dimensionality"? You can't abstract around dimensionality very effectively without losing fundamental capabilities in the underlying component (in fact a staggering number of abstraction failures in general can be shown to come from exactly this problem if you really learn to think this way), and the complexity comes poking out in the XPath. It's still better than groveling over the DOM yourself, but it's probably also the absolute peak of concision that is obtainable; there will be nothing better.
JSON is indeed simpler in that you don't really have 3 or 4 feasible choices per attribute; {"first-name": "John", "last-name": "Smith"} is pretty much your choice, full stop. That leaves the underlying library fundamentally, not accidentally, simpler. This can get you into some trouble in some cases, for instance XML is a better choice for HTML-type tagged text as the JSON for tagged-text is just hideous (and, interestingly, reopens the dimensionality problem as there is no one obvious solution), but many things are fundamentally simpler than tagged-text.
If you want to pick up a defined serialization format, my gut would be to say to default to JSON and back to XML if you really need it for something... but be aware that you may, and it's no better to try to jam JSON on top of a fundamentally XML problem. (Besides, your JSON can carry bits of XML in it without much pain, so "best of both worlds" is perfectly feasible.)
Righty-o, like I said I wasn't sure. :) But I figured it was still worth posting. I don't see much level-headed analysis of the issues. Too many devs got burned by XML then can't help but get a little fanboy-ish over JSON, which has distorted the dialog a bit, I think. And I like getting the idea of API dimensionality out there.
Actually, it's really easy to write, if you want to. I haven't had any need in my big JSON project (everything ends up pretty hierarchial so any query like "give me all the object attributes that are 'xyz'" isn't useful), but if I needed one it would be easy to write. A good learning exercise for recursion, if you're not familiar with it.
What is the advantage of XML over JSON for nontrivial, non-naive data structures? Do you mean things like XLink/XPointer? (Or where can I go to read more about this?)
I've gotten the informal impression that incidental complexity (you know, complexity arising not from the problem but the solution) is a factor behind the hugeness of the XML ecosystem, but I'd be happy to learn otherwise...
> I've gotten the informal impression that incidental complexity (you know, complexity arising not from the problem but the solution) is a factor behind the hugeness of the XML ecosystem
You are absolutely correct.
As Phil Wadler put it: "The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well."
It makes a trivial issue into a byzantine enterprise.
But, it has created a whole industry of 'experts', standard committees and other busybodies, so it must be good for the economy at least!
Come on. Any one of us could write a JSON parser in a page of code. An XML parser is significantly more work due to the overcomplexness of XML. Both to write, and for the CPU to run. And it's larger. In fact I'm struggling to think of something XML has that is good. (And please no one respond with "it's extensible" or I may explode).
An XML parser has to track open tag names, with JSON it doesn't matter. XML has all stupid entities like & which look ugly and need to be parsed.
I think "time to write a parser" should be a good metric on how sane a data format is. The fact that writing an XML parser that covers all bases/eventualities is a major undertaking says alot about the data format.
XML brings a lot of other baggage to the party. Some of it can be useful. I particularly like XML namespaces, because when used properly there's hardly an equivalent in any other standardized serialization format I know. (And part of the problem is that you really need it standardized at the serialization format level for it to work; you can hack things into any other format but by definition you're not doing it in a standard way.)
However, that statement should be understood through the filter of the fact that I've only seen one thing that uses XML namespaces properly, and that's XHTML. Everything else I've seen gets it wrong, and that includes most things trying to deal with XHTML....
You also get a "free" and modestly powerful validation system, a serialization format that has seriously thought through encoding issues and has answer for them (JSON does too, but a lot of other fly-by-night stuff doesn't), a fairly powerful format for tagged text (JSON-tagged text is a hack no matter how you slice it). You also get XSL, which floats some people's boats, though I wouldn't be caught dead working in it.
If you don't need any of that, don't use it. I don't very often. But when you need it, do. Also:
"An XML parser has to track open tag names, with JSON it doesn't matter."
This is equivalent to JSON needing to track {, [, ', and ", among other things. That's just parsing; both JSON and XML need to be parsed. That's not an advantage.
"XML has all stupid entities like & which look ugly and need to be parsed."
This is equivalent to the escape sequences in JSON: http://json.org/string.gif They also need to be parsed, they do not magically turn into bytes without that.
>> This is equivalent to JSON needing to track {, [, ', and ", among other things. That's just parsing; both JSON and XML need to be parsed. That's not an advantage.
JSON can simply count brackets. That makes for a very simple parser indeed. XML needs to cope with invalid nesting, end tag names not matching start tag names etc. End tag names are just wasted space.
>> This is equivalent to the escape sequences in JSON: http://json.org/string.gif They also need to be parsed, they do not magically turn into bytes without that.
But those are sane. We all escape double quotes and backslashes in pretty much every programming language. They make sense in a very simple encoding.
Why should I need to replace & with %amp; they seem arbitrary, and the replacements aren't simple. " ?? seriously? you're naming characters with odd abreviations, and then expecting people to remember those? why not just escape them with a prefix such as erm.... "\"
Well-formedness constraint: Element Type Match
The Name in an element's end-tag MUST match the element type in the start-tag.
XML parser rejects the document as soon as this occurs. No different than with `{]` in JSON.
Changing " into " allows you to go to the next " without worrying about the contents before. The next " is the ending quote. Then you can resolve all the internals lazily... saving on processing time compared to JSON. For example "\"\\\"" will go through many branches and conditions. ""\"" is a simple jump over to the next " character.
You know, I know we're just geeking out here so please don't read too much into me taking the devil's advocate position, but: yes, XML is harder to parse, but the underlying data that gets encoded in JSON and XML isn't substantially different.
In both cases --- and let's take the C implementation case --- you're still building a poorly specified buggy implementation of Tcl to actually hold the data and answer questions about it.
XML is a huge scam perpetrated on the software industry, but now it is too late because a huge parasitic 'industry' has built around it, and too many people (specially too many PHBs) have invested their reputations on XML being the ultimate standard for representing data.
I've seen that a billion times, if you're doing DOM, you have to pull that whole thing into memory and then run over it again. Most JSON-based storage systems I've seen recently are more record based so you can stream it through.
So I guess my beef on that one isn't specifically with XML, you could split the above snippet into separate entities.. but I'll note that it still involves about 90% markup and 10% data. Not exactly the most efficient thing possible.
If the file is big, chances are that it is basically a sequence of relatively small chunks. You don't need SAX parsing for this. Just read the file one chunk at a time.
The point here (in the orig article) was that in some cases you do (i.e. if all you have is a SAX parser - Obj-C iPhone SDK being singled out as an example where only a SAX parser is supplied by the SDK)
YAML had a fuckton of sugar, where JSON has as little as possible, but the biggest differentiator is that YAML has references — you can express cyclic graphs. That's why it's a natural fit for fixtures and seed data in Rails: it natively handles relational data.
Mostly it is a reaction to the over use of XML and how and why in many situations using something other then XML is beneficial. If a large swath of people of start using JSON without thinking about you will probably start seeing a similar reaction extolling the virtues of another format over JSON.
The trouble was that it was the all-purpose data exchange tool for a long time, and everyone has seen XML put to all sorts of ungodly purposes. If it were invented today, nobody would be bashing it, because it would be less widely used -- the few people who did decide to use it would be putting it to good use, so they would love it. Everyone else would be using something else that was simpler.
The argument that XML can have various different structures for storing a person's name, while JSON provides one simple solution, doesn't fly. You could run into something like
{
"Person": {
"property": {
"type": "first-name",
"value": "John"
},
"property": {
"type": "last-name",
"value": "Smith"
}
}
}
This begs the question, "Why would you do something that convoluted?" Well, you can ask the same of the XML examples, and the answer probably boils down to requirements (or incompetence?).
Actually it's very simple. If you don't need children don't make them. I measured it once and storing stuff in attributes had ~10 times faster parsing in MSXML.
But if we know what he meant why does this matter? Language exists to convey meaning, and I think we all understood what he meant, so I don't see the problem here.
The problem lies in dilution: if people recognize a new meaning in an old phrase, it becomes more difficult to convey the old meaning. (Because you can't use that phrase any more.)
There is also the case where you thought you conveyed the new meaning, while it hasn't caught on yet (meaning, you made a mistake). So, better stay safe and stick to the old meaning while we can.
The dilution has already happened, and this cause is pretty much lost.
English is my second language. I've known the "new" meaning since I was a kid.
I've to date maybe seen the "old" meaning used a handful of times other than in examples given by people trying to correct someone using the new meaning.
Outside of academia I'd be surprised to see it at all. I suspect it would be confusing to more people than would recognize it.
The dilution has already happened, and this cause is pretty much lost.
Don't give up too easily.
Outside of academia I'd be surprised to see it at all.
That's because it's originally an academical term for a logical fallacy. This is like the abuse of 'eigenvalues' by all kinds of crackpots and we should never stop fighting that kind of language abuse. We can't keep inventing new terms, just because others have hijacked the previous one.
The dilution has already happened, and this cause is pretty much lost.
Don't give up too easily.
Outside of academia I'd be surprised to see it at all.
That's because it's originally an academical term for a logical fallacy. This is like the abuse of 'eigenvalues' by all kinds of crackpots and we should never stop fighting that kind of language abuse. We can't keep inventing new terms, just because others have hijacked the previous one.
OK for this particular cause. I was more about a general stance: trying to slow down the rate of change in languages. The slower the change, the less confusing the language.
"But if we know what he meant why does this matter? Language exists to convey meaning, and I think we all understood what he meant, so I don't see the problem here."
I often don't know what the user meant. It makes for extra work to stop and think what possible meaning was intended. Some thing when people use "less" for "fewer". I have to think, did they mean "lesser" or "fewer"?
I respect the difference, but I'm afraid there's not much point in trying to educate people. This has already been an utterly lost cause for decades (at least).
I prefer JSON over XML for most applications, but one advantage of XML is that its strict structure aides parsers. With XML it's easy to see if you've closed all of your tags, etc.; it's all about the parser.
Conversions between XML and JSON can be a challenge (defining namespaces,etc.). The Google Data API handles this very well. Check out their nice side-by-side example: http://code.google.com/apis/gdata/docs/json.html
if you are talking about JSONp: JSON is by no means required to do that. You could in theory easily pass a string containing XML to the callback function.
There are so many cool things you can do with JSON. I just created a boolean logical statement builder on a web page by using free form open and close parenthesis, and and/or radio buttons. A little regex to change the parenthesis into square brackets, then an eval(), and I can walk the whole statement recursively.
How? I'm evaluating strings that are built with my javascript code or from my server code, not arbitrary user input from other users. Yes the user is able to enter parenthesis into a textbox, and those become part of the evaluated string, but I regex replace out everything but the actual parenthesis.
I guess the point you all are trying to make is that some javascript text could have been maliciously inserted into the page somehow, and accidentally get eval'd simply because eval is in use, but the page below says that one of the only times eval should be used is to build up complex mathematical expressions. Is there a safe way to build up such expressions? Send it to the server, and invoke a JS engine there? The reason I made my original comment was because I found the eval function to be so helpful in this scenario, b/c I didn't need to use any type of syntax parsing.
If you are confident about charsets and you whitelist down to known-good characters ([A-Za-z0-9_ \t]) I have nothing snarky to say about the design. Otherwise, try reading this very short thread:
Yes, but the security of that part of your system could be pretty implicit, hence less than robust in the face of maintenance coding. The idea that "this is safe because everything but the parens are filtered out" won't necessarily jump out of the code at whoever maintains it.
If the code is structured to reveal this intention explicitly, then job well done.
(Note: It's not always a good idea to rest the future security of your system on a comment in the code!)
Rather than "regex replace out" problematic elements, you should immediately toss the data back to the user for correction. Trying to "correct" a potential hacker's string is often a losing proposition.
It seems to me like it's only dangerous in the context of XSS or bad server-side validation. With Firebug I can run any arbitrary JS on any website I like, even change what's already there. But that just affects me, unless the site uses what I can modify on the server and doesn't validate it there.
The question to ask is: is the JSON coming from a trusted source (e.g. generated by your server and you can reasonably sure that you have no funny XSS holes)? If the answer is yes, eval()-ing JSON is perfectly safe. If the answer no, you need to parse it without the help of the JS interpreter.
The argument is always the same, and the JSON crowd always assert "superiority" derived from sheer simplicity. XML is too hard for them, and they're part of some bizarre quasi-political movement that trashes object-oriented principles. Thus, 'simple' trumps validation via type, version control (and hence interoperability over time without breakage and maintenance), robustness.
The weakness in the Javascript "eco-system" is it's quasi (the 'quasi' qualifier applies frequently in this eco-system) support of objects. XSD/XML is powerful when used in an object paradigm, and object paradigms have proven (not just via community claims) to be very effective.
The question is, when will the JS community step up to the plate (i.e. mature)? So much energy is wasted now on making JSON work -- just in order to make Javascript easy. Wrong priorities.