Shameful confession: when I was first introduced to JSON, I was convinced it would go nowhere. "XML already does everything JSON does! And there's no way to differentiate between nodes and attributes! And there are no namespaces! And no schemas! What's the point of JSON?" And a couple of years later I looked up from the tangled mess of XSLT I was working on to discover that the rest of the world had moved on.
JSON is just javascript, and it leaves out everything else. That's its entire reason for existing, and it caught on because that's all that 99.44% of anyone needed.
Timestamps you can add today, without changing the protocol; just put them in your data if you need them. So I'm not sure what he's even proposing there.
Schemas: OK, he doesn't like JSON Schema's anyOf. Fair enough. There's no real proposal here for how to fix it, so not much to say here.
Replacing commas with whitespace sounds to me like replacing a very minor irritant with a constant full-body rash. Stuff like his example of "IDs": [ 116 943 234 38793 ] would lead to far more confusion and errors than the occasional stray trailing comma.
So I guess I pretty much vote no on this one thanks for asking
> Replacing commas with whitespace sounds to me like replacing a very minor irritant with a constant full-body rash
Both vivid and accurate.
In order to combat trailing commas I normally place the comma on the following line, e.g.
{ "time": "3 minutes past four"
, "age": 229
, "sex": "monoecious species"
, "appearance": "Tree-like. It's a tree."
}
uint8_t *data // Yada
, *buffer // Ya
;
var javascript
, variable = 6
, declarations = [ "This is taking too long", "Yep" ]
, mix = [ "Flour", "Sugar", "Water" ]
, types = { 'old' : (d) => { return d < new Date }
, 'new' : (d) => { return d > new Date }
, 'borrowed': (o) => { return false }
, 'blue' : (c) => { return true }
}
, regularly = new Date().toISOString()
;
With such a format there is only ever a problem deleting the first line, which I find is much much harder to do without also noticing what you've done to the larger statement.
I came across this formatting pattern in Haskell, but I still prefer trailing commas for one reason: I can trivially apply line-wise operations (e.g. sort or align) on the key-value pairs without breaking the syntax. When I sort your first snippet line-by-line, it becomes
, "age": 229
, "appearance": "Tree-like. It's a tree."
, "sex": "monoecious species"
{ "time": "3 minutes past four"
}
and the syntax is broken. With trailing commas, the syntax always stays valid:
{
"age": 229,
"appearance": "Tree-like. It's a tree.",
"sex": "monoecious species",
"time": "3 minutes past four",
}
Yes, I'm aware of this being an error in JSON (which is precisely the point of the discussion). I wasn't aware that it's a problem in C. Maybe gcc is more lenient here.
No: tptacek did not bother paying attention to the context; as far as I know "uint8_t * data, * buffer, ;" (spaces added after * to avoid italics) would never be valid.
How is that the context? In the sense of the intersection between Javascript and JSON, trailing commas have pretty much always worked in C, which is something all of us who write code that generates C rely on, like a lot.
"Timestamps you can add today, without changing the protocol; just put them in your data if you need them. So I'm not sure what he's even proposing there."
He is proposing to add a timestamp type to the grammar. This would have the advantage that there would be one canonical way to have timestamps in your JSON. It would also mean that the parser would already validate them for you and you would not have to do that yourself every time.
> He is proposing to add a timestamp type to the grammar. This would have the advantage that there would be one canonical way to have timestamps in your JSON.
There is already a canonical way to have timestamps: encode them as defined in RFC 3339.
Done.
The proposals add nothing to JSON and don't make sense. I mean, screwing up with the language just to add a very specific data type that is used in only a specific corner case that isn't even supported in the use case that's provided as an example? Nonsense.
> There is already a canonical way to have timestamps: encode them as defined in RFC 3339.
That's not canonical at all. It's not even a de facto use of timestamps in JSON; most specifications I see call for ISO 8601.
> Done.
Well no, not done, because you then have to wrap it in a string, which essentially hides it from the JSON parser altogether and moves responsibility for parsing it out of the JSON parser into your application. As far as the JSON parser is concerned, it could be any old string, not a timestamp.
> a very specific data type that is used in only a specific corner case
I don't see how you can call timestamps a corner case – the article says "Among the RESTful APIs that I can think of, exactly zero don’t have timestamps." – and my experience is the same. Pretty much every API I've worked with has used timestamps in some form or another. They aren't a corner case at all – aside links, they are probably the most common data type without direct support in JSON.
> That's not canonical at all. It's not even a de facto use of timestamps in JSON; most specifications I see call for ISO 8601.
Nonsense. ISO 8601 is an ISO standard that defines representations of dates and times.
Date and time representations aren't timestamps.
RFC 3339 is based on ISO 8601 but is designed specifically to handle timestamps by including provisions that tackle interoperability problems that ISO 8601 does not handle.
> Well no, not done, because you then have to wrap it in a string
Nonsense.
JSON is just the languge that is used as a superset for other domain-specific languages. How the other language defined (and parsed) isn't handled by the JSON parser, obviously.
Considering Tim Bray's example, If a domain-specific language specifies that a "Capture time" string is followed by a string storing an RFC 3339 timestamp, that's precisely what the parser for the domain-specific language is expected to parse. The document remains valid JSON, and the domain-specific language remains valid.
> I don't see how you can call timestamps a corner case
Because it is. It isn't a primitive data type, nor is it required to implement generic data structures. Hell, some programming languages don't even support any standard date and time container or aggregate data type. Timestamps are a very specific data type whose application is limited to very specific corner cases that are already handled quite well by other means.
> RFC 3339 is based on ISO 8601 but is designed specifically to handle timestamps by including provisions that tackle interoperability problems that ISO 8601 does not handle.
So in other words, what I'm saying isn't "nonsense" at all, and they aren't fundamentally different things as you claim – you just prefer a standard that you think is better?
In any case, you're getting away from the point there. The point was not which standard was better, the point was that RFC 3339 is not canonical.
> > Well no, not done, because you then have to wrap it in a string
> Nonsense.
It's not nonsense in the slightest. You can't put an RFC 3339 timestamp into JSON without wrapping it in a string, at which point it is no longer part of JSON. All JSON sees is a string, not a timestamp.
> The document remains valid JSON, and the domain-specific language remains valid.
I never said that it wasn't valid JSON, my point was that as far as a JSON parser is concerned, it's a string, not a timestamp, so parsing a timestamp has to be handled by your application not the JSON parser.
> It isn't a primitive data type
When the subject of discussion is whether or not it should become a primitive data type, that's circular logic.
> nor is it required to implement generic data structures.
No, but it is required to implement a vast number of JSON-based APIs.
> Timestamps are a very specific data type whose application is limited to very specific corner cases
The "very specific corner cases" being every single RESTful API the author can think of? This is an incredibly common use case, I don't see how you can argue that it's a corner case when it's practically ubiquitous.
> So in other words, what I'm saying isn't "nonsense" at all, and they aren't fundamentally different things as you claim – you just prefer a standard that you think is better?
It's nonsense, because a timestamp isn't a date representation. This was already demonstrated. I don't understand why you decided to ignore this.
> The point was not which standard was better, the point was that RFC 3339 is not canonical.
No, the point is that the ISO standard you've quoted doesn't define timestamps. Hence, the example you provided to refute what I've said was nonsense.
> You can't put an RFC 3339 timestamp into JSON without wrapping it in a string
...and you can't encode a date without representing the year as a number, the month as another number, the day as anohter number, etc etc etc.
You, somehow, miss the point that a primitive data type is not required to represent timestamps.
In fact, you can represent timestamps in JSON by defining an aggregate type.
Timestamps as primitive data types doesn't make any sense if it's possible to use the types that are already available to represent it.
> my point was that as far as a JSON parser is concerned, it's a string
Somehow, you don't understand that JSON is only the superset language, and that JSON-based domain-specific languages represent specific subsets of JSON obtained by imposing other parsing rules.
> When the subject of discussion is whether or not it should become a primitive data type, that's circular logic.
You somehow already forgot that this particular data type is already representable by using another primitive data type.
> No, but it is required to implement a vast number of JSON-based APIs.
No, it's not.
Wrap it in a string. Done.
If that's too hard to do, just specify an aggregate data type.
Is it that hard to understand?
> The "very specific corner cases" being every single RESTful API the author can think of?
Somehow, no RESTful API was barred from being implemented in JSON because of this corner case.
I suspect you are, somehow, confusing "convenience" with "necessity".
Are you proposing that timestamps as described in RFC 3339 be put inside quotes and just be string values in JSON? That's the simplest way I can think to do it, but I've had adamant protest to that idea.
If you are not suggesting that timestamp values be wrapped in quotes, then wouldn't you have to worry about every existing parser out there tripping on them?
The former I assume. It sounds fine to me. if one is worried aboutfragmentstion, create a standard on top JSON that provides rules for dealing with other types. Layering 101, people!
I was thinking along very similar lines when reading the article by Tim Bray. JSON is designed for JavaScript compatibility, I almost stopped reading when he said:
> They’re inessential to the grammar, just there for JavaScript compatibility.
The entire article goes on to suggest extremely terrible solutions to the problems he is pointing out.
In order:
Commas: If comma's are an annoyance, is that to mean there is no annoyance with XML markup? You can forget a closing tag just as easily, probably more easily. Just get a JavaScript or JSON linter. Fixed. If you are really annoyed by commas, move to YAML or another format.
Schemas: There's lots of potentials here. However, again, if you're going to change JSON so much, why not just pick another data format that has better schema support.
Javascript compatibility is the least important part of json, lots of people use json with python or ruby or rust or java or c for things that will never interact with javascript, and anyways you don't want people to parse json with eval or confusing json with things that include string concatenation or function calls adding non javascript syntax helps make that clear.
There is a good reason to not change JSON(backwards compatibility, updating parsers, confusing formats, ect.), since you're not planning on parsing javascript with eval there is no good reason why any changes made in JSON 2.0 have to conform to a subset of javascript.
I think if breaking changes are made, there is no point, it would not be called JSON 2.0; it is an entirely new data format. When there are hundreds to thousands of alternatives, trying to "fix JSON" isn't the solution. JSON works because it is simple and easy to follow, it is robust and flexible, and has no restrictions. There is no good way to store dates in XML or many other formats, without the use of Schemas, and the syntax changes are minimally beneficial. It wouldn't save a lot of bandwidth with gzip, and the text data likely doesn't add up to a lot of disk space when compared to the data being stored; so removing commas which make parsing JSON easy and increases its adoption and usage, just so some people can have less linting errors doesn't seem like a large benefit.
> If comma's are an annoyance, is that to mean there is no annoyance with XML markup? You can forget a closing tag just as easily
That's not the same thing at all. The commas in JSON are equivalent to the whitespace between attributes in XML. A missing closing tag in XML would be equivalent to forgetting to close an object literal or array in JSON. So no, there's no equivalent XML annoyance to what he's asking for, XML is already fixed in the manner he suggests.
>JSON is designed for JavaScript compatibility, I almost stopped reading when
Why does Javascript compatibility matter in 2016? The original idea for JSON might have been to parse untrusted data using js eval() but I'd hope that nobody is doing that anymore.
I don't believe the original purpose of JSON was to eval untrusted objects, since that would always have posed a security risk. It did however offer a very easy standard way to share and parse data back in 1999. In 2016, I believe continued backwards compatibility is a necessity; otherwise you are just creating a new data storage format. So removing commas doesn't seem like an acceptable improvement, especially when the recommendation comes out of a necessity due to human error while hand-editing. JSON is incredibly easy to read and write by hand, in comparison with XML it is far easier to parse and traverse through code do its significantly simplified structure. If however, you need more functionality, switch to a different format; why bother trying to mangle or force JSON to work where it doesn't. There are other technologies and standards out there, use them.
The issue is not about forgetting something. XML elements are self-contained, you don't need to separate them, thus you don't need to known the context. If you generate XML programmatically, each function can just push its object into the stream and be done. With JSON you need to know if your object is last or first at this level of hierarchy, so you know if you need to insert a comma or not.
Commas were meant to be used in code that you actually type. Using them as separators in a format that is meant to be produced and consumed by computers is clear, simple, and wrong.
Except you can put comments in your Javascript ... I'd love to have comments in JSON. It's not so important for machine generated files but when you hand-craft an example it's nice to annotate it.
> I'd love to have comments in JSON. It's not so important for machine generated files but when you hand-craft an example it's nice to annotate it.
Luckily, Douglas Crockford has an explanation and an almost prophetic solution[1] for your use-case:
I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
So I guess I pretty much vote no on this one thanks for asking
Yeah, this is precisely the kind of stuff people said would cause JSON to "lose" to XML -- how could you ever build anything without schemas and types and bunches of metadata to ensure you were using it correctly?
Today json is not really tied to javascript that much, I deal with it all the time in python that doesn't include or interact with javascript. JSON is just dictionaries and lists (and strings and numbers and nulls) this is pretty easy to express in dynamically typed languages and not that hard in statically typed languages if you give up on a bit of (static) typing. I don't see any reason why it should be limited to a subset of javascript, there are some good reasons for not changing it all though.
I mostly agree, except I still like and use XML in many cases. XML Schema validation is just too useful as a first pass validation step to verify the "shape" of a message/document.
And to me, with any decent editor with folding, it is easy to read. And very importantly, allows comments. I have never understood all the XML-hate out there...
While I respect tbray we need to fix engineers that try to 'fix JSON', absolutely nothing is broken with JSON.
It is nearly a perfect spec for the role it plays. It has taken off because it is so simple. I am constantly amazed that something so simple keeps being jabbed at with more complexity by engineers.
The job of an engineer is to make things that are complex simple, JSON did just that. Make a new standard for your fixes and watch them spiral out of control like XML. Making a simple format that works is damn hard and JSON is a nearly flawless attempt. It is baked into the ecosystem so let's not break it. It was a response to XML complexity/verbosity for use in javascript first, it is so damn useful that it is used everywhere now, why? simplicity.
In my opinion there are five problems with JSON in practice:
1. Trailing commas should be allowed. This would fix the entire comma ordeal the author is complaining about without introducing awkward whitespace logic.
2. Date time values should be a distinct type using the format of JS's Date.prototype.toISOString method. Had this been in the language when JSON was originally defined I'm sure it'd already be in JSON.
3. There should be a way to represent Infinity (for IEEE 754 compatibility) and possibly NaN.
4. Comments would be nice because JSON is often used for configuration and documenting JSON snippets can get awkward without them.
5. There's no real way to represent binary data other than base64 encoded strings. It would be nice if there was at least some hex format (e.g. `"blob": <dead beef cafe babe>` with optional spaces for readability) but I can see why some people might be opposed to the idea of adding this to the language
Although I'm personally not _as_ bothered by 5, the rest of these are spot-on. The other big annoyance for me is the inability to use anything other than string as an object key (numbers, for example, but others would be interesting/useful too I guess).
I'm a bit wary of the second and fifth one. You could add an arbitrary amount of datatypes. Also I think the common argument against comments makes much sense: they could get abused easily for parsing directives etc. JSON isn't that nice for configuration files anyway IMO.
I would actually take away a feature of JSON: Unicode escaping. It's not needed anyway and astral plane characters being encoded as surrogate pairs is an added complexity for encoders/decoders. The selection of escape codes for control characters also is rather arbitrary.
The argument about abusing comments with parser directives is totally bogus. You can create parser directives with extra field names too! Worse, people create comments anyway using fields like "//" -- but they're much crappier than real comments, since they have to be a valid string value. We gave up a solution to a real problem in favor of a bogus solution to a theoretical problem.
- No support for NaN and Infinities (i.e. lack of full mapping to IEEE 754).
Also at least on Chrome, JSON.stringify(-0) gives 0, though JSON.parse("-0") does give -0. Weird..
As far as schemas are concerned, I find it nicer to not use schemas but access the JSON through an interface that checks things on the fly (types, existence), effectively verifying the JSON while you're examining it. E.g. things like x.get_int('name'), which throws exception if it doesn't exist or isn't an integer. Possibly even with an error message explaining where exactly and what is wrong (like: foo.bars[4].bazes["abc"]: attribute "qux" does not exist).
With the exception of trailing commas, all these things would break the fact that JSON is a subset of JavaScript. Breaking that would cause a lot more problems than it solves. It's just that the minor problems you have to live with seem worse than the major problems you don't.
What value is there in having JSON be a subset of JavaScript? In fact, it would be a good idea to make it not be a subset of JS, because that would prevent people from evaling it.
It's also (almost) a subset of Python, and possibly other languages. The benefit is that it is trivial to remember, because you already know the syntax. And you can not just eval it (which you if course shouldn't do if you didn't write the JSON yourself), but more important you can just copy it into your code. Or start with a object literal in code, and when it grows, or you want to use real-world data, switch to an external data source.
There are a couple of projects which extend JSON. I believe an ideal JSON superset would be compatible with ECMAScript 6 object literals, and have comments and trailing commas.
Yes, one of my main uses of JSON in my personal projects (and at one of my previous companies) is to serialize a Python dict to disk or network (and to read a Python dict from disk or network, of course). If I keep the members to int/float/str/dict/list, it works beautifully, and if I need to serialize anything more complex, I can always use jsonpickle.
Don't forget also that large integers are perfectly valid JSON (which imposes no semantics on numbers except that they are representable in decimal), but do not survive translation to JavaScript, which has only IEEE floats.
the loose specifications of json types is my biggest gripe as the maintainer of a json library. if json mandated that all strings be representable as utf8 byte sequences (after resolving escapes) and that all numbers be representable as ieee 754 doubles it would be much easier to provide sane and consistent defaults for encoding and decoding. as it is all implementations have to make explicit choices as to how to handle things like escaped strings that represent invalid utf8 byte sequences (like `"\uD800"`) or numbers unrepresentable with a double (like `1e400`). there isn't even any sort of consensus on how to decode a number representable as an integer. some implementations produce integers, others floats
Isn't the simple solution to decode e.g. a number as a special type, say JSON_number, which can fully represent what's on the wire? Then the user can access the number via one of a number of conversion functions depending on the native data type he/she wants (e.g. JSON_number_to_float). Is there really a huge benefit to eagerly performing this conversion?
that's a simple solution yes, but also a verbose one. the primary appeal of json as a data exchange format is that it maps to ubiquitous types like objects/hashmaps, arrays, strings and numbers. if you need to hint the decoder or manually perform conversions it loses a lot of appeal as a user
Using JSON in javascript to transport data from some backend language to the browser in a simple and easily-parseable format is probably the most common use case for it - and no one has to worry about JSON support on the frontend as long as it's a valid subset of javascript, because it "just works" as long as javascript works.
Breaking that would only lead developers to write code to unbreak it.
Most major browsers now support non-eval JSON parsing that it's a moot point - but it still requires it to be valid JSON, also be be usable in javascript out of the box.
JSON has the same thing going for it as the h.264 video codec - it may not be _the_ best/most open/etc. format around, but it is one of the few that is supported out the box by all major browsers and Just Works™ (with it's shortcomings).
At this point, I'd settle for JSON5 (http://json5.org/), which says any JSON that's a valid ECMAScript 5 object is valid JSON (including trailing commas, unquoted keys, single quote strings, and comments). I don't think it's very popular at this point though, but I guess I've never tried to use it.
JSON is not optimized to be written by humans (I'd argue even JS is not well optimized for that </joke>).
There are other formats for that, like YAML and TOML[1].
When JSON is needed for human-to-computer stuff, I'd say TOML is pretty much always a better choice. A common use case for TOML is config files. And I see a lot of config in JSON lately; this makes me sad. (Though not as sad as the config-in-XML thing the Java ecosystem suffered from.)
Counter-point: you can tell if the JSON you received is intact or was cut off, because it won't parse properly without the closing braces. You don't have that safety net with YAML or TOML.
YAML is a pain to parse. And as a human, having to remember quoting string values like "yes" and "on" to prevent them from being interpreted as booleans in YAML really grinds my gears. I don't care if it's prettier; YAML can't disappear quick enough.
I wrote a haiku some years back to demonstrate how broken the Ruby implementation was.
YAML:
takes a no for a 'no': no
spring,summer,fall,winter: 3,6,9,12
the game is: on
Running it through ruby 2.3.1 yields..
{"YAML"=>
{"takes a no for a 'no'"=>false,
"spring,summer,fall,winter"=>36912,
"the game is"=>true}}
>>> load(x, Map({"YAML": MapPattern(Str(), Str())}))
{'YAML': {'spring,summer,fall,winter': '3,6,9,12',
"takes a no for a 'no'": 'no',
'the game is': 'on'}}
>>> load(x, Map({"YAML": Map({"spring,summer,fall,winter": Str(), "takes a no for a 'no'": Bool(), "the game is": Bool()})}))
{'YAML': {'spring,summer,fall,winter': '3,6,9,12',
"takes a no for a 'no'": False,
'the game is': True}}
I know this is from Tim Bray, one of the primary authors and/or maintainers of XML, RDF, Atom, and JSON, but while I too find some annoyances with JSON, I accept them because:
- JSON was never designed to be hand-editable. It's a lucky accident that it's more palatable to edit than XML, but there are other formats like YAML, TOML, and even .ini that fulfill that purpose.
- While JSON was originally envisioned as a subset of Javascript and intended to be materializable by JS eval(), this very quickly turned out to be a bad idea. Once we all switched to using parsers, there is no more compelling need to remain compatible with Javascript, other than human recognizability of syntax, and, y'know, compatibility with 'old' JSON; and
- JSON's greatest strength is that it has such a good impedance match to native 'abstract' data structures supported in just about every contemporary programming language: strings, numbers, lists, maps. When people say 'simple', I don't disagree, but I'm convinced they're actually referring to this feature. Adding additional types -- yes, even timestamps -- would break this property, as datetime handling in most languages leaves a lot to be desired.
The first language that comes to mind when I hear "datetime handling in most languages leaves a lot to be desired" is, slightly ironic considering JSON's origins, javascript. If you ever touch a date in JS you HAVE to pull in something like moment, full stop. JS Date is a steaming pile of shit.
To go on a tangent, Java8 dates are "awesome" if you've used Joda-Time before, for the rest of us it's a pain in the backside.
I found myself spewing lots of code to just do Moment one-liners. That made me appreciate Moment much more, to the extent that I sometimes just eval a Moment script with Nashorn at times.
On the JSON part, I agree with everyone, my go-to language for most things is JavaScript, and I'm okay with dates in JSON. I think it's RFC3336 as everyone says (2016-08-20T21:12:26.231Z).
Date parsing could be better when moving across languages, but I mostly prefer sending around millis since Epoch. Considering that people mainly look at their JSON for debug purposes, anyone seeing 1471727663538 should recognise that it's a date.
Other commenters put it well that the author should rather propose a new language, JSON is fine the way it is. Typed successors like Protobufs don't have native date parsing, but I guess being binary, they take away the commas too. I'm comfortable hand-writing JSON or editing it when needed. Most text editors help out by highlighting errors and keeping track of parentheses.
To continue the tangent, perhaps you read my mind, because I believe that Java8 and Joda (and their ports) are the only two libraries that do datetimes properly, and every other one confuses and/or conflates human calendar and clock concepts with absolute point-on-a-timeline timestamps, and the like.
But this illustrates the point: datetime is hard and we don't agree, so even going so far as retrofitting a native RFC 3339 datatype into 'official' JSON is, in my opinion, too far.
I'm fairly new to date manipulation in Java, between LocalDateTime, Instants, TemporalAdjusters etc. I get confused, but I'll catch up.
The comment I replied to mentions Moment for JavaScript, which I would add to your list. It's beyond excellent, great documentation as with most good quality libs, and very intuitive.
I agree with your point and opinion. Someone mentioned on another thread that engineers are there to simplify things, 'official' JSON has done a darn good job at that, no surprises across languages there.
I thought that the Date API was closely copied from Java? I do agree it is nasty to use.
Tim's ideas are just dangerous - they ruin compatibility for questionable gains. Why not just invent his own standard and call it something else, rather then trying to bust what we have already.
If so when it was copied Java had shit DT handling. The lack of a concept of timezones and auto-converting a UTC time to local time make life a living hell as soon as your code has to span more than 1 TZ.
The proposal doesn't fix JSON it introduces an entire new format. One which I would not be able to use without a 3rd party library that parses it. If I'm already using JavaScript that is one of the many benefits it has in the first place.
Timestamps are easily included when I want them, I tend to prefer UNIX time as it's just so much more impact proof. With this proposal can I use UNIX time if I want?
I don't have any problem with JSON. If I did I would have a problem with JavaScript itself, and while there's plenty to complain about I wouldn't propose mixing more formats into it as a solution. There isn't enough care being taken not to introduce additional complexity into the JavaScript world.
Dave Winer's remark from 2006 offers some historical perspective:
"I've been hearing, off in the distance, about something called JSON, that proposes to solve a problem that was neatly solved by XML-RPC in 1998, the encoding of arrays and structs in a format that could easily be processed by all programming languages. The advantage being that you could easily support the protocol in any language that supported XML and HTTP, which, at the time, was quickly becoming all languages."
Then there was someone, much younger than Dave Winer, who wrote a response to this (during 2006), and who said (paraphrase) "When I'm older, I look forward to my XML moment; my horror at something that the young kids are doing."
I wish I could find that second quote, but I worry it is lost in the mists of time.
If you're going to drop the commas in the dictionary definitions, why not drop the colons too?
{ "key" "value" 42 37 }
It's a rhetorical question; I think removing the commas is a bad idea. Troubles while hand editing json shouldn't be an overriding use case that drives the syntax.
All of those complaints about json and nothing about how you can't have comments?? I feel like Tim Bray and I aren't even on the same planet.
var config = HJSON.parse(fs.fileReadSync('config.hjson'))
Another more obscure, and more powerful serialization format that does away with commas (commas are treated as whitespace!), has a date type, and much more is Rich Hickey's (of Clojure fame) EDN. https://github.com/edn-format/edn
I thought the major irritant was having to add double-quotes around keys? Commas never bothered me personally, but having to use double quotes is actually irritating.
It comes from the flow of edition: remove the last element of a list (which typically is on its own line), forget about the fact that there is a comma at the end of the previous line.
But both optional commas and optional double-quotes are provided by Hjson <https://hjson.org/>. They do not advocate its use in protocols, however.
- Getting standards passed is hard (especially for popularly used standards), but can be supported by latent usage in the community. You can create a plugin for a few languages that automatically allows the timestamps and doesn't require commas. Get a bunch of people using it, and you'll be on your way to actually getting it in the standard in the future (still debatable in this specific example, but broadly, usage never hurts for getting something into a standard)
- If your concern is your own frustrations (rather than benefits of the vast majority of people) - I highly recommend a simple editor-level mapping that translates from your preferred DSL to the underlying standard
I agree with those who say JSON should never change, but should one day be replaced. It behooves us to think about what that replacement should be.
My own peeves:
* Numbers are stored in decimal but usually interpreted as binary with no guidance about rounding.
* The concatenation of valid JSON blobs is never valid JSON.
* The empty string is not valid JSON.
The last two can be "fixed" by a format called "arraySON" where the global blob represents not a JSON object (dictionary), but the contents of an array plus an optional trailing comma.
Or given some of the ideas on this thread, perhaps a manadatory leading comma.
I'd see a trailing or leading comma as a potentially null, or undefined. It's intentionally against spec. I personally see the concatenation an issue in particular with streaming. Streaming JSON objects and parsing them on the fly is a pain, usually requiring a SAX parser. To concat two objects though, I feel like you could just merge the two, if you are okay with appending two arrays, if you had to merge two JSON objects you could. Numbers I seen issues with, I've had to store numbers as strings before, there is guidance however, the official ECMA-404 spec for JSON says there can be any number of digits following a "." in numbers, then optionally followed by an "e", "+" or "-" and more digits. For parsing then, it would be limited by the data types available to the parser. Since JavaScript itself is 17, so that would be the default maximum for JSON IMHO. However, different libraries for different languages may not have 17 for the native decimal types.
If you did all of that you'd still fall way short of what you have been able to do with XML for probably 10 years. XSD defines structure in terms of inheritance, composition, strong types, numerical ranges, enumerations, referential integrity, namespaces.
Requiring that a markup language (that must be sent across the wire between multiple buggy parser implementations) be hand editable is a recipe for disaster, and will never be as flexible as building/modifying an appropriate AST in a scripting language.
There's not even a good reason to read HTML anymore. You hit Ctrl+Shift+C in Firefox (or I think it's J in Chrome) and you view the actual machine-parsed DOM structure. The only reason human-editable/readable formats were ever necessary is a lack of appropriate dev tooling.
Those things aren't a goal or desired feature of most JSON use. There's a reason people hated SOAP. XML is better suited for documents. For everything else, XML doesn't work very well.
The best way is to require one comma between elements but allow one trailing comma after the last element. This way you can both do multi-line and one-line lists nicely.
The stricter syntax allows JSON to be directly parsed outside of JavaScript in other languages like Python. This is a nice feature and one of the selling points of JSON.
JSON values low variance between different representations of logically equivalent documents. It's not a primary concern (e.g. the explicit lack of key order allows a lot of variance), but it seems to trump syntax convenience just about everywhere except whitespace. Going all unquoted-keys would also be low variance, but this would restrict keys from "any string" to "valid javascript identifiers" or something bespoke in between.
I'm no expert, but I'd imagine fully dropping the quotes wouldn't allow you to use the same syntax as a map/dictionary in Javascript, whereby the keys could be [almost] any type, and string keys could contain characters such as - or even : which otherwise would prove ambiguous to a Javascript parser.
I don't know the history of why design decisions were made in JSON, but one might be the allowance for spaces in keys and mirroring the usage within arrays (where commas and spaces may be used in a string, but shouldn't be confused with a record separator).
Per my other comment, there are a variety of plugins on the editor side that will make JSON look like this (e.g., vim-json ). To write out to JSON from that format, you can obviously just write a plugin (in JS this is one line).
I don't know who the OP is but I'll assume he's an authority in JSON since he talks about how he wrote some RFCs and stuff.
Anyway this is why I think there should be a better way to move tech standards forward than leaving it up to some small number of people who are way too ahead of the curve that they think everyone else has the same problem they do.
In my opinion there is nothing wrong with commas. I am sure the OP had his own issues while working too much with JSON, but for the 90% of the population it's not a problem and even a better way to represent data.
As an example, [116, 943, 234, 38793] is 100% clear and intuitive. Anyone who has seen any array representation will get what this means. But change it to [116 943 234 38793]? There are so many ways you can interpret this. I guess the only people who would think this is better than the 100% intuitive [116, 943, 234, 38793] option are JSON nerds who think too much about this (and when I say JSON nerds I don't mean JSON "users", but people who try to come up with the standard)
As someone who really hates what's going on with javascript (with all the new syntax that needlessly sacrifices simplicity and intuitive nature for the sake of making things shorter), I can't really think nicely about these opinions.
I think these standards that are used by a huge population should move forward in a more democratic way. Or at least let more "normal" people make the decision.
Commas irritate me too. But I think removing them from The Serialization Format Known as JSON is a mistake.
My workaround is to use CSON, since I'm already using CoffeeScript. It add the syntactic sugar where I'm hand-editing JSON, yet it doesn't impose my tastes on the rest of the world.
Dude really? Commas are not a big deal and you're talking about a major change to get rid of them; making white space matter. Currently white space doesn't matter. In order to get rid of commas it has to. That should not be glossed over or treated lightly.
I don't agree with the "timestamps" part, for several reasons:
1) the author does not mention in his example the use of timezone offsets (which are a part of RFC3339), that's a bit irritating
2) Timestamp parsing and emitting sucks, no matter the language, because the format specifiers are wildly inconsistent across platforms. I prefer the good old UNIX timestamp, because with it there's no complex massaging needed in order to operate with the timestamp.
Also, the author is missing comments entirely - no matter if shorthand // or long-form /* */ gets included, JSON desperately needs commenting ability; but on the other hand I'm worried about compatibility with older systems...
2) The JS Date type has a "toISOString" function that emits an ISO date time string like the ones used in the examples. Since everything else is already based on what JS does, why not just move that behaviour into the JSON spec as well?
Could you please elaborate? I'm vaguely familiar with S-expressions from my cursory inquiry into lisp (planning to take the plunge in November), but I don't see how they can be applied in this context.
S-expresions are really just a way of representing a tree of data (in Lisp code, the S-expression literally represents the abstract syntax tree that is used for the code).
I feel like YAML might be a bit to complicated in the other direction, with many ways of encoding quite complex things and the connected risk for bugs. (E.g. the Python-binding, pyYAML, can't even read its own output correctly in all cases)
Commas aren't a huge deal for me personally but if there were to be a "json2", dealing with them and removing the double-quoting requirement for keys would make authoring a little easier. Ultimately these are issues that are easily solved with a linter though.
A standard date type would be huge however, and I'll add to that built-in support for comments. These two things would expand the scope of what you could do with JSON. JSON is already used in so many places for config files but without comments it can be a poor choice.
JSON is fine for what it was intended to be: a way to serialize fundamental data structures to disk or over the network.
If you want something optimized for human editing, such as for a configuration file or a DSL, we already have YAML for that. Hell, JSON is a valid subset of YAML, so you don't even need two parsers.
Json is flawed in any way possible, imo. It was intended to be edited and read by human; it can not comments, trailing commas, non-quoted keys. It was planned for universal exchange; it doesn't contain dates, full-floats, byte arrays in readable form. It can't have backreferences for graph exchange or per-packet string interning.
Even "obvious syntax" argument is wrong. "Key": value is not what comes to mind unless your parents spoke to you on js.
Why it is still here: legacy that was easier to read-write than sgml-flavoured monster. Other arguments are nonsense.
Hand-editing JSON may not be the most important way of interacting with it...
In that case, the arguments in favor of breaking long-established compatibility for the sake of hand-editing should be particularly strong.
If personal forgetfulness about editing around commas or date strings are strong arguments, then they should surely also apply to every programming language that uses commas or lacks the concept of a date literal. After all, the code of programming languages exists to be edited, and JSON is merely a means of serialization.
I would like timestamps introduced as a first class citizen, and can be done relatively cheaply. As the OP said, they are very common in protocols, and it would be nice to have 1 way of encoding them. Especially 1 that doesn't involve an arbitrary offset...
The other suggestions are interesting, but rather distracting IMHO.
JSON is a beast to deal with for anyone who's manually editing it. All the comments here about special formatting strategies to avoid comma problems, or how to workaround the lack of a timestamp type just prove the point that JSON needs to fix these things.
Would trailing commas and timestamps be nice. Yes. Would they be so nice that dealing with the json to json 1.1 transition would be worth it. Not really.
JSON is just javascript, and it leaves out everything else. That's its entire reason for existing, and it caught on because that's all that 99.44% of anyone needed.
Timestamps you can add today, without changing the protocol; just put them in your data if you need them. So I'm not sure what he's even proposing there.
Schemas: OK, he doesn't like JSON Schema's anyOf. Fair enough. There's no real proposal here for how to fix it, so not much to say here.
Replacing commas with whitespace sounds to me like replacing a very minor irritant with a constant full-body rash. Stuff like his example of "IDs": [ 116 943 234 38793 ] would lead to far more confusion and errors than the occasional stray trailing comma.
So I guess I pretty much vote no on this one thanks for asking