Hacker News new | past | comments | ask | show | jobs | submit login
Hjson, the Human JSON (hjson.org)
257 points by 56k on April 14, 2016 | hide | past | favorite | 213 comments



JSON plus comments and python-style multi-line strings is great.

The thing where you can leave quotes off strings makes me nervous, especially the example where the value is HTML with its own embedded double quotes for attribute values.

Not requiring quotes on strings like that looks like an obvious vector for injection attacks. I guess Hjson isn't designed to be generated automatically, but I'd prefer a format that is easy to generate safely.

What I really want is JSON plus comments plus multi-line strings plus relaxed rules on trailing commas... While maintaining as simple and unambiguous a parsing model as possible.


If trailing commas and quotation marks are optional, then does this:

    foo: 4,
...produce the value 4, "4", or "4,"?


Excellent example.

While I have been hoping for "JSON plus comments" to be a real and common thing for quite a while now, one of the strengths of JSON is right there on json.org. See it? A set of five simple syntax diagrams that entirely and virtually unambiguously define the json syntax.

It’s tough to know when to stop when simplifying a syntax. (For an extreme example, see Stylus, which was like Sass but so extreme that mixins and properties became ambiguous for each other.) I, too, would like to see the return of quotes for string values, for increased clarity.


From the docs:

  > When you omit quotes the string ends at the newline.
  > Preceding and trailing whitespace is ignored as are escapes.
  >
  > A value that is a number, true, false or null in JSON is 
  > parsed as a value. E.g. 3 is a valid number while 3 times is 
  > a string.
(edit: formatting)


Right but the point is that you had to consult the docs. It's a problem of ambiguity on the part of the format.


It is ambiguous only if you think it can be a completely useless format. If you decide to use it, chances are you considered it useful, and the only useful way this can parse is as a number, otherwise it would't be able to represent numbers, booleans and null. I'd say it parses pretty much as expected, and as such, contrary to other comments, you don't have to go back to the docs every so often.


Of course you have to get the information from somewhere the first time you learn something. What's the problem with that?


The problem is that this is supposed to be a more-easily-readable form of JSON, but it requires consulting the docs to understand the meaning of something which is completely unambiguous in regular JSON.


> Of course you have to get the information from somewhere the first time you learn something. What's the problem with that?

The first time, and the second time if you haven't looked at it in a month, and a third time a month after that. And so on.


So if you put "true" (without quotes) as a value, you get a boolean, but if you put "True", you get a string?


So would that be "4," that you end up with, then?


I came here to make the same comment. The ability to leave off quotes on strings is a misfeature which overly complicates the language. Also I see no need for three different comment styles.


Yes! There is the same ambiguity in Yaml, so syntax highlighters of the format don't agree where a string ends...

The site even has an ambiguous example: three: 4 # oops Well, is that "4 # oops" or 4? I saw no rule about ending comments, and they say that three: 3 times is a string.

I have seen no formal specification of the grammar, so we already have lot of ambiguity. Good luck to implementers...

Seriously, removing the "no quotes needed" rule would improve greatly the format. If you want to include HTML with double quotes literally, just use the multiline string format and be done.


Use what you want, ignore the rest. Like C++.


That might work for programming languages (C++'s bad rep nonwithstanding), but for data exchange formats you cannot cherrypick features, you have to support the whole spec, and nothing else.


I think it's more nuanced than that. You must be strict in what you emit, but can be liberal in what you accept. That liberalness can go too far though, and should not make parsing brittle, or encourage misuse of the spec. It's there more to allow someone's unambiguous mistakes to still parse.

- Accidentally left in a final comma on a list? That's okay, that only means one thing, we understand.

- Allow non-quoted keys on objects? Well, we understand JavaScript generally allows this, so we'll let it slide. This time.

- Make newline significant and define new items? Okay, are we just ignoring space efficient payloads now? Should making it space efficient mean changing formats from Hjson to json?

- Considering all terms in place of a object value a string until a newline? Are you just trolling me now? How is that more human readable? Does your spoken language not use quotes to distinguish distinct chunks of communication or something, and if so, does it use a Latin alphabet so it's off-putting when you see them?

Needless to say, I'm really confused by the reason this even exists.


I think the problem it is solving is that JSON is designed and best used as a data exchange format, but it also gets used for configuration files, which it does okay with but is not really so good. INI files don't have a clear standard. YAML is too complicated, and using turing complete javascript for configuration seems like you've just gone too far.

we just need JSON, but with a couple things fixed up to make it nicer to use for configuration files.


> we just need JSON, but with a couple things fixed up to make it nicer to use for configuration files.

Using JSON for configuration is just the whole situation of using XML for data exchange redux. One of the major points for JSON over XML for data exchange was that it was so much better because it was optimized for data, not markup. Why are we ignoring this argument now that JSON is on the other side? JSON is used for configuration because it's ubiquitous, not because it fits the problem domain well. Let's just choose a more appropriate format.

Choosing the most common set of rules for INI files (what is proposed by Wikipedia[1] is probably sufficient) would serve us MUCH better than coaxing a data interchange format into that role.

1: https://en.wikipedia.org/wiki/INI_file


If you're using a strongly typed language, XML even has one (IMO) massive advantage over JSON: You can use XSD to define a schema declaratively. This means you get

(1) lots of general tooling support, in particular you get at least decent editor support for your config file, and

(2) you can autogenerate the code needed to read your configuration into structured data without having to do any unnecessary duplication of "key names" (tags) as you have to with e.g. INI or JSON. You also get the data sturctures themselves for "free" (based on the XSD).

Alright, it's not the end of the world to not have these things, but they're both very nice to have.


have you seen http://json-schema.org ?


Yes indeed I have, and I have several problems with it, one of which is "Expires: August 3, 2013" with no new version in sight. My other major problem is that AFAICT it doesn't support one of my pet favorite features namely "Algebraic Data Types"[0, 1]. If you extend json-schema like Swagger has done it might support ADTs properly AFAICT, but I don't have any actual practical experience with Swagger.

[0] https://en.wikipedia.org/wiki/Algebraic_data_type [1] I should note that support for ADTs in XSD is sometimes sketchy in the various code generators, but at least the XSD specification supports it. (It could also be argued that XSD supports something even more general... which it probably shouldn't since ADTs basically cover the whole data structure space unless you go to higher kinds, inheritance and such.)


well basically virtually anything other than JSON that's actually designed to be configuration would be better for configuration. There's dozens of them. that is the problem: JSON parsers and generators are ubiquitous in the way that no single configuration format is. Right now, I can just use json in any language and get data between any language and any other language. If I use it for config I get the advantage of even being able to config across multiple languages that might be getting used in a single system if I have to. No proper configuration format has that level of mindshare and interoperation.


Something like edn[1] perhaps? Worked really well in my projects.

[1]: https://github.com/edn-format/edn



While I agree on some points, I do not agree in general. I think too much prescriptivism in protocol implementation is a naive approach, and assumes that we can always get things right initially. Sometimes real-world concerns and needs drive changes, not just sloppiness.


I prefer to openly discuss the features while giving the authors feedback instead of just ignoring what I don't want, but that's just me.


Features that cause ambiguity make things harder to do right.


Well, the first thing I thought of is what nightmare would it be to safely implement a parser for this in C. I filed a Github issue for this one: https://github.com/laktak/hjson/issues/37


Have you looked at JSON5? http://json5.org/


JSON5 looks really good. I'm not sure about this one part though, from a point of view of simplicity and interoperability when deserializing the data:

> Numbers can include Infinity, -Infinity, NaN, and -NaN.


This is a good thing. It means there is a 1:1 mapping between numbers in JSON5 and IEEE floats.


i've used json5 since the beginning. it's great! just a couple week ago they merged in descriptive error messages too. No more wondering where the bug is!


Strings without quotes leads to all kinds of trouble in YAML. You end up just quoting everything anyway the first time you need to use "false" or "[some words here]" as a string.


This ought be fixed by making YAML explicitly rather than implicitly typed.

E.g.

   x: yes

   >>> str(yaml['x'])
   yes

   >>> bool(yaml['x'])
   True


> The thing where you can leave quotes off strings makes me nervous, especially the example where the value is HTML with its own embedded double quotes for attribute values.

Learn from Perl. The quote operator is your friend (and I frequently lament it's omission in Bash). You could simplify it by not using the matching enclusures ({ and }, [ and ], etc). It's easy to parse. and if you keep the quoting character somewhat rare, it's not hard to read.

E.g.

    {
      "string" : "A string without inner quotes",
      "quotes1" : q!A string "with" inner quotes!,
      "quotes2" : q|A string "with" inner quotes|,
      "quotes3" : q@A string "with" inner quotes@,
      "quotes4" : qTA string "with" inner quotesT,
   }
Edit: To be clear, I wish JavaScript had a quote operator, and JSON started with it. :/

1: http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Ope...


I haven't used Perl in quite some time, and this is the sort of thing is why it was bad. Quotes are quotes are quotes in almost every language. It's completely unambiguous, the downside is that you sometimes need to escape them.

This, on the other hand, is a 'solution' to escaping quotes that is completely mad. Using non-standard quotes, especially mixing and matching them is a disaster for readability and maintainability (using a T in your string now? need to change the quotes!). Triple quotes are just find if you want to avoid escapes, and hjson seems to support them.


> This, on the other hand, is a 'solution' to escaping quotes that is completely mad.

Meh, Perl's solution is fine. You can throw up your hands and say it's crazy, but as a person who worked with Perl for 20 years, I've never had the problem you describe. I tend not to use the qq() or q() style quotes, but I've used s@@@ and s,,, so many times I can't count. It's really quite nice (and perfectly readable unless you do something weird like 'sxxx'.


> Quotes are quotes are quotes in almost every language. It's completely unambiguous

Oh, like in C and C++ where single quotes denote a char, and double quotes a string?

Or in Perl, PHP and Ruby where double quotes interpolate, and single quotes don't?

Or in JavaScript and Python, where there's no functional difference between single and double quotes?

Or C#'s string literals which only support double quotes, but you prefix the string with @ to denote it's verbatim?

Or systems that allow repeated double quotes within a double quote string literal to stand in for an escape ("foo ""bar"" baz"), as many SQL systems do?

Or what about systems that interpolate, and the differences between what they do and do not interpolate? Variables? Escape characters? Hexadecimal escapes?

You're fooling yourself if you think it unambiguous in anything except for the language you are dealing with, and if you're within that language, who cares what you use as long as it's consistent? You learn it, and then it's unambiguous (if implemented well).

This is no different than if your language supports hex numbers (usually done through prefixing it with 0x). Those are two different ways to specify the exact same thing (a binary number!). The benefit comes from using it in the circumstance where it's appropriate. That is, where it enhances readability, not where it detracts from it.

> This, on the other hand, is a 'solution' to escaping quotes that is completely mad. Using non-standard quotes, especially mixing and matching them is a disaster for readability and maintainability

Maybe you think

    "{\"foo\":\"bar\",\"baz\":\"it's that's they're we're\"}"
is fine, or you may prefer

    '{"foo":"bar","baz":"it\'s that\'s they\'re we\'re"}'
(if your language supports it).

I prefer

    q|{"foo":"bar","baz":"it's that's they're we're"}|
because I think it's clearer, and learning once that a literal q defines a new quote operator that is in effect until it's next seen is simple, easy to remember, and yields very useful readability gains.

> using a T in your string now? need to change the quotes!

I included the qTT example just to show how it worked, not to endorse its use. I thought that would have been obvious from my statement "You could simplify it by not using the matching enclosures ({ and }, [ and ], etc). It's easy to parse. and if you keep the quoting character somewhat rare, it's not hard to read."

In any case, I fail to see how how that's a problem beyond any other quote character. Including that character in your string will result in a compile time error in all but the most esoteric of cases, making it easy to find.


And don't forget Tcl where { and } are actually quotes.


And similar with Rebol / Red where you have two distinct quoting literals...

  "string with no newlines"

  {A multi-line string
  that can run over over many lines
  and {even be} nested}
For unbalanced {} string then you would need to escape it...

  {escape closing brace ^}  &  opening brace ^{ is all you need}


Do checkout http://json5.org/ based off the original JSON author Doug Crockford's own proposed parser extensions, primarily trailing commas and comments, both of which have been a point of contention pretty much the day since the day JSON landed. It's a much simpler and saner proposal.

Trailing commas and JSON comments are are already supported in the newer browsers (try the Chrome console for instance).

Fortunately quoteless strings or optional-commas/newline-separator as proposed in Hjson will never fly. They are brittle and ambiguous. Who knows what will this get parsed as:

    {
        a: hello's and hi's have
            'misplaced' apostrophes
        b: ball: a round # and # bouncy object
        c: cakes and
            candy: both have sugar
            # but how do I include a hash at the start of a multiline-unquoted string?
    }


> Trailing commas and JSON comments are are already supported in the newer browsers (try the Chrome console for instance).

Version 49.0.2623.112 (64-bit)

    > JSON.parse('{"foo": "bar",}')
    > VM124:1 Uncaught SyntaxError: Unexpected token }
Javascript object literals != JSON. JSON is a restricted subset of JS object literals (and not actually a strict subset: a JSON string can contain unescaped U+2028 "LINE SEPARATOR" and U+2029 "PARAGRAPH SEPARATOR" codepoints, a Javascript string can not)



YAML is more complex that most people tend to realize. (This was brought up in a 2011 discussion about possibly standardizing a metadata section for Markdown documents which sadly went nowhere. [1])

Take a look at example 2.11 in the YAML spec [2], for example, and see if you can make heads or tails of it.

[1]: https://pairlist6.pair.net/pipermail/markdown-discuss/2011-A...

[2]: http://www.yaml.org/spec/1.2/spec.html#id2760395


pyYAML has this collection of problems with the spec: http://pyyaml.org/wiki/BugsInTheYAMLSpecification

(and pyYAML itself can't always parse its own output correctly...)


You don't need most of those features. A pared down YAML with the cruft removed (implicit typing, flow style, tag tokens, node anchor & references) is actually pretty simple as well as less "gotcha-y".


Sure, but most language YAML parsers support all or most of the spec. That can be a problem if you aren't expecting it.


I believe it has even created security issues. Didn’t Rails have at least one YAML-based vuln?


You need to restrict YAML to SecureLoad, with manually adding allowed typed and classes.

At least perl doesn't support this, so it's inherently insecure there, but you can always use YAML::Syck which didn't go this way.


From hjson.org:

"YAML expresses structure through whitespace. Significant whitespace is a common source of mistakes that we shouldn't have to deal with."


Okay, does anyone actually believe that causes a problem?

I can think of ONE time when that causes a problem, and that's with indentation with multi-line strings. Oh look, HJSON included that feature. That's like throwing the baby out and keeping all the bathwater.


I can't find a specific example off the top of my head but I'll say I've been managing a Jekyll site for a while now and whitespace errors in frontmatter and data files cause all kinds of problems. I'm not sure I could explain the details but it's a legit criticism of YAML. IMO part of the problem is that YAML looks very straightfowrard and is until it suddenly isn't. Whitespace is part of that problem.


As an anecdote on the flip side, I've been building Middleman sites for a while now and can't remember ever having an issue with whitespace in the front matter or local data.


Yeah I'm with you. Developers should have no problem dealing with whitespace and the result is you get a easier to read format.

Although admittedly I haven't had to work with YAML a lot but I have liked it when i've touched it.


yaml has any number of ambiguous cases


Could you provide some examples?


Unquoted strings are valid in yaml just like this format. There are at least 2 ways to specify a list of things. There are some super bizarre looking possible formats for lists of mapping types.

There are a number of others given the length of the spec. Yaml is a complicated beast that generally has more than one way to do any given thing


I like that you can leave the quotes off of keys, which should always parse as identifiers. (And those that don't should require quotes.) Leaving the quotes of values seems like the problem.


Take a look at HONEY: https://github.com/honey/honey

Primary goals were to remove as much syntax as possible and make it play well with line-based diffs (with the hopes that someone who knows knowing about the language could resolve conflicts without getting tripped up by surrounding quotes, trailing comments, etc).


Unfortunately, conflicts in white-space based languages can get even worse than regular conflicts because you have very few visual structural "anchors" to start to gain an understanding of the conflict. (If you have to resolve manually, that is.)

Granted, if the number of conflicts which cannot be automatically resolved is reduced by enough, then it might not matter in the grand scheme of things. However, I'd be worried that this would make "accidental" automatic resolution of semantic conflicts more common. That may be an unfounded/irrational fear, I don't know.


- Someone saves data as text with a simple format

- It works great, lots of people start using it

- People start adding features to fix annoying things with the format, add support for binary data, comments, schemas, add more metadata etc..

- Many versions proliferate, people start writing converters and verifiers

- A standards committee is formed and write an 800 page spec and 80kloc reference implementation

- Eighteen different libraries wrap or reimplement the reference implementation

- Someone gets fed up with this nonsense and converts their app to save their data in a new simple text format.

- The circle of life continues.

I love this idea and wish json had comments, too, but if you start hitting the point where JSON is not expressive or fluid enough, that's a hint that it's probably not the right thing for what you're doing. This variant puts a lot of work into human-friendly json, but if you're doing a lot of hand-editing of a file, it should probably not be JSON.



> Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

And now you can't roundtrip the comments if for some reason your JSON parser needs to change something.


I would prefer not to inspect json for comments. Something is ambiguous, you should have a model definition sent along with everything else.


Agreed. Do not want "relaxed JSON". There are editors for valid JSON. Use one of those.[1]

[1] http://www.cleancss.com/json-editor/


This spec repeats one of the problems with using YAML as a configuration spec. To quote: "if your key includes a JSON control character like {}[],: or space, use quotes if your string starts with { or [, use quotes"

JSON and YAML are interchange formats, not configuration formats. Rather than than hacking up an interchange format, it's probably better to use something designed for configuration formats, like TOML.


I think it's reasonable to call YAML a configuration format.


JSON is an interchange format, but YAML is pretty obviously meant for configuration, given the huge emphasis on human readability and editing.

As for TOML, it's a good replacement for mostly-flat INI-style files but the syntax is really awkward for the kinds of places you'd normally use YAML, especially nesting lists/maps.


Top line on yaml.org:

What It Is: YAML is a human friendly data serialization standard for all programming languages.


or SDLang


Rigidity and consistency are not always bad things. They can help prevent bugs, security vulnerabilities, and they drastically reduce complexity of implementation.

JSON might often be too rigid, but I think it's important to note that "easier" (in that you don't need to learn the syntax) isn't always better.


This "easier" format is actually more "complex". JSON can be described with just a couple of simple rules (http://json.org/ ), while this format adds many new rules (e.g. for doing same things in multiple ways). The more rules you have, the more things you have to remember, the harder it is to find the reason of the problem.

BTW. it is very simple to do comments in JSON :) You can just add "comment1" : "This is my comment", to any object, it will be ignored by software that processes your file.


I kind of like doing comments like this:

    "#": " my comment"


the problem with that one is that some parsers have a meltdown if they encounter duplicate keys. sigh


Ha ha, I do exactly the same thing. Lack of comments is really the only beef I have with Json (well, no multi line text too I suppose).

A new spec that just addressed those alone would be great.


Yes. There are two distinct use cases here: configuration files, and interchange formats.

For an interchange format, JSON does the job very well. Small, simple, human readable, easy to implement.

For a configuration format, JSON leaves a lot to be desired. It's almost there, but has enough warts to be annoying.

You're not going to get a one-size-fits-all format.


This sounds more like Python dictionaries. I don't see more bugs when I am using those over JSON, and I find them a lot easier to work with.


JSON5 [0] is better as unlike Hjson, it doesn't include non-ECMASCript syntax.

[0]: http://json5.org/


And yet again no date format. Am I the only person who is ever inconvenienced by this? It seems like it's so obviously the most glaring flaw with JSON that I'm surprised nobody wants to fix it.


Just use ISO 8601?


Sure, that would be fine. They could also use a totally wacky format, I don't care. The problem is that without a standard the JSON tools in different languages don't all agree on how to serialize dates and you end up needing to deserialize/serialize manually the smooth over the differences. The problem isn't that you can't represent dates; it's that there is no standard way to do it.


> The problem isn't that you can't represent dates; it's that there is no standard way to do it.

ISO 8601?


That's not "standard." That's one of many ways that people represent dates in JSON, because the JSON specification itself does not mandate that you use any particular representation. If you use it there is no guarantee it will be recognized as a date on the other end.


What black hole are you sending JSON into where the only way they could know something is a date is if you use a date type? Why can't that be part of the data structure you agreed upon in order to communicate in the first place?


Even if there is an agreed on schema, parsers can generate native Date objects on the recipient if there is a date type. When you deserialize a nested graph of objects, it's hard to convert each date to an actual date if what you get is just a string. Makes it a lot easier during integration.


Why am I hand-writing code to supplement the parser? I mean, hell, we don't need JSON at all, why don't I just invent my own serialization format and write parsers for it in each environment I want to use it in.


Well, it'd need to cover dates, times with time zones, date times with time zones, times without time zones, date times without time zones. Seems better pegged to the data source without thinking it out clearly.


So? It's not like this is a complex problem nobody knows how to solve.


Agreed! But it's not so simple to just add a "date" type; it would add significant complexity to a relatively simple text format.


Yeah, I guess, but everyone ends up implementing it anyway; the problem is they all do it in an idiosyncratic way (and one that is ambiguous... after all, maybe I really wanted that to be a string).


That is a string. Not date.


SDLang have support for date formats.


Cool. Never used D and that's the first I've heard about it but I have respect for any system that handles this problem. :)


Why do you need ECMAScript syntax? It's not like you're going to directly feed a json file into a JS interpreter. Well, not unless you have a death wish.


Because it's a syntax that needs no extra explanation.

It's not a superset, so no extra add-on features need extra doc, and it's less of a subset then JSON so the rules of what's disallowed are much simpler.


I feel like HOCON fills the space pretty well, and has implementations in most languages now. https://github.com/typesafehub/config

But I'm a scala developer so I might be biased.


I'm not a Scala developer, and I don't think I'd ever heard of HOCON. Has it seen much adoption outside the Scala community?


I have a (big) .NET project that uses multiple libraries one of them being Akka.NET which uses HOCON. Being honest, it felt completely awkward to use it in this respect (a little corner in the project that uses that different HOCON thing).


What's the difference to / advantage over YAML? When I want a loose syntax "JSON with comments", I can just use that instead.


> YAML expresses structure through whitespace. Significant whitespace is a common source of mistakes that we shouldn't have to deal with.

> Both HOCON and YAML make the mistake of implementing too many features (like anchors, sustitutions or concatenation).


But this too has significant whitespace, acting as a comma in separating properties. (It's not clear whether you can mix commas and whitespace here.)

Also this claims to not need escapes, but it's also not clear how this format handles a comma or a newline in strings without escaping, do they act as a comma to separate properties or do they act as natural commas/newlines?


Only quoteless strings have no escapes, and according to the docs the rule is:

  > quoteless strings include everything up to the end of the
  > line, excluding trailing whitespace.
(edit: formatting)


So this

{

  foo:one,

  bar:two
}

Parses to "one," because it is a quoteless string?

What about true and false, is false the boolean constant or a unquoted string "false".


I posted this in another thread here, but it's documented in the linked page:

  > A value that is a number, true, false or null in JSON is parsed as a value.
  > E.g. '3' is a valid number while '3 times' is a string.


>YAML make the mistake of implementing too many features (like anchors, sustitutions)

That's why I remove those features.

Significant whitespace is a normal complaint for beginners in python too, but most people prefer it in the end.


A point against YAML, that I always read, is its specification, too big etc. I don't know much about this issue...

I like how nice config files can look with YAML and JSON being a subset of it makes it even more convenient


YAML spec: http://yaml.org/spec/1.2/spec.html

JSON spec: http://www.ecma-international.org/publications/files/ECMA-ST...

If you haven't read the JSON spec and you use JSON I recommend doing so. It takes five minutes. My personal favorite line: "Because it is so simple, it is not expected that the JSON grammar will ever change."


I did, long time ago.

Also, they're probably right. I mean JSON isn't pretty, but its easy. Why should I use YAML for prettier config files, when the people looking into those files are all technical anyway.


HJSON isn't exactly a nicely compact spec I'd want to use in data exchange either. For local configuration files and the likes, where you want comments etc., it doesn't really matter how big the spec is.


"Both HOCON and YAML make the mistake of implementing too many features (like anchors, sustitutions or concatenation)"

YMMV, but if you're aiming for a format that's edited / maintained by humans things like YAML's anchors and substitution are exactly the features I'd want...


Just the other day I commented complaining about JSON config files without comments, but now here I am complaining about _three_ ways to write a comment. OK, I can see two ways: block and line comments. But why two ways to write line comments? Why start off a new grammar with that added complexity?


I also notice everyone seems to keep silent about the fact that comments were intentionally left out of the final JSON specification to avoid abuse by parsers/vendors.


Comments would be nice but it is also nice to keep JSON pure and simple. There are some other json formats that use comments like jsoncpp but really not needed.

But, if comments really are needed, another easy way to have comments is have a file that rides to the side of any json files or docs. Sometimes we use a markdown/text file next file.json -> file.json.md / file.json.txt to describe overall or a file.meta.json that has comments per key. This is only needed sometimes for physical files. If json is from the server, commenting can be done there or in docs if needed.


The notion that comments are neither simple nor really necessary is completely bonkers to me, or anyone else who has implemented a lexer or tried to debug an undocumented config file.


Added complexity for parser implementors, but arguably more freedom for the user. (?)


But why?


You mean why bother having multiple ways to comment?

Well, I sometimes use both C-style (/* ... */) and C++-style (// ...) comments in the same file to sort of mark the relative importance of comments. But yeah, I guess that's not strictly necessary, I could do without.


Ah, yes, the trailing comma in a list, which I like to refer to as the "Silicon Valley Comma"

[ "a", "b", "c", ] // the silicon valley comma


Of course it looks ugly if you list things horizontally... :)


If that's the Silicon Valley Comma, what do you call this?

    [
       "an
     , "array"
     , "of"
     , "things"
    ]
Because I have an irrational hatred of that style. (Yes, I know the purported benefits when diffing files, I don't care :P)


That's not as good as the SVC because you'll still get unnecessary diffs when prepending to the list

SVC allows you to both prepend and append without adding extra diffs


It's a nice thing to have when you generate JSON programmatically :)



Looks like a solution in search of problem, to me. JSON is designed to be machine-readable, and to the extent that I actually need to human-read JSON, which is not that much, I don't find it all that difficult.


Standards are better when they are followed. Chicken and egg problem, these alternatives are DOA because they are not going to be popular.

The only reason JSON is popular is because of Javascript. And the only reason Javascript is popular is because of the browsers and their history.


I've used with great results in my procedural planet generator[1]. It's very forgiving, so made writing a "UI" with lots of complicated controls very easy for me.

[1] http://wwwtyro.github.io/planet-3d/


nice little project you've got there :)


JSON parsers are not really slow. JSON is simple enough that allows multiple implementations for parsers and easy adoption. But HJSON additions have some serialization cost overhead.

Because of this eventually you will need to convert your HJSON to JSON prior to deploying, and that would make things slower. You will be dealing with 2 formats instead of one.

Then, do you really believe that adding all this syntactic "features" (overhead) will make it less error prone? It will make it more error prone because it has more things to consider!


It's supposed to be used for config files like package.json and friends in the javascript world.

It's going to be parsed essentially once---startup.


In that regard I prefer HOCON.


Oy, someone loves yaml...

I'm quite happy using a preprocessor like [0], which keeps the great simplicity of JSON and just allows comments.

[0] https://www.npmjs.com/package/strip-json-comments


> Oy, someone loves yaml...

I don't, actually, I use preprocessors too¹, but since they're not always an option, I'd rather recommend yaml than have yet another pointless config file language needlessly fragment the market.

¹: Thankfully it's rather trivial in python: https://github.com/creshal/yspave/blob/master/yspave/pave.py...


"Helps reduce errors" - you're really trading errors for other errors - your behavior is now more ambiguous with more edge cases, but look, you don't have to place quotes around strings! (except when you still do)


Can someone explain why people want to use a data-interchange format like JSON for configuration files, rather than using a configuration file format like TOML? I've never understood why people want to use JSON for config files.


I agree about JSON, but I can't say I understand the logic behind TOML.

I know it's supposed to be a config format, but it only seems to make any sense for INI-like configs that are little more than a flat key-value map.

The places I see people using JSON/YAML/etc for config are much more likely to have nested structures that would be extremely awkward to represent in TOML. I think YAML was on the right track, and if you ignore the messier parts of the spec it works pretty well.


I was mainly using TOML as an example of a config format in contrast to JSON, I wouldn't say it's definitely the answer. Nesting is possible with TOML, but I'd agree that it could get pretty awkward depending on your needs: https://github.com/toml-lang/toml#array-of-tables for example.

I personally don't mind YAML all that much either, although the spec is pretty large.


pretty much everything under the sun can encode and decode json, often as part of the stdlib

toml requires you track down a toml parser, at the very least


Yeah, that's fair. I've alway found json as a config format cumbersome though (like in Packer) and there are plenty of TOML parsers out there (https://github.com/toml-lang/toml#v040-compliant).

Maybe that's an argument for languages to start adding some configuration format other than XML into their standard libs.


This abandons a lot of principles of JSON that are there to avoid ambiguous situations. The small benefits don't seem to outweigh the snake pit you're jumping into.


It has always bothered me that the JSON standard does not allow for comments. Especially when you want to annotate some sample response/request.


They were removed because people started trying to use them to hold additional parsing directives and other meta-information which would have destroyed interoperability and defeated the entire purpose of a simple interchange format. See: https://groups.yahoo.com/neo/groups/json/conversations/topic...

If you want to annotate JSON in documentation, I say "go ahead and just use //". Any programmer reading it will understand that those lines are taken to be comments and they shouldn't type them in their final request.


It's just a hack, but you can write a comment in a key that is supposed to be ignored:

  {
      "__comment": "The following config does...",
      "key": "value"
  }
But I agree that it is not much intuitive.


JSON isn't semantic so there is no need to put __ before comment, it's not like it's meta or something. It will still be parsed and processed by the JSON parser, which is a waste of computer cycle.


It's kind of a future-proofing thing. If you put a field called "comment" in a JSON blob, especially one in a format you don't control, you run the risk that future versions of the format will define the "comment" field and give it actual meaning. A crazy prefix makes this at least slightly less likely.


...and the lack of trailing commas.


It's an oversight, but does it really matter anywhere? YAML can be used for a JSON-y syntax that allows comments, which is good enough for most use cases.


IIRC douglas crockford purposely didn't include comments



So definitely a thought out and purposeful decision, thanks for the links.


To be honest: relaxed formats usually bring a lot of glitches to keep in mind. It's probably easier to use stricter specifications.

Take YAML, it looks pretty natural at first sight, but has a virtually infinite list of gotchas.


>has a virtually infinite list of gotchas.

That's why I wrote this:

https://github.com/crdoconnor/dumbyaml

YAML is far better with explicit typing and flow style, tag tokens and node anchors/references removed.


It already exists YAML. Also cson it is worth to look at.


I find YAML/cson very difficult to read.


the json spec is pretty easy to follow, even for a human coder. this is overkill imo


In fact, reading the doc for this takes longer than the reading the JSON spec.


Shouldn't we keep our format specs simple and strict, and relegate aesthetic, and typo-correction stuff to the editor?

i.e. something with aspects of clang-format (which tries hard not to change the meaning of your code even if it's broken), and the aggressive autocorrection necessary to make typing on a touchscreen work?

I suppose there are converters from this to json, though, so maybe this is just a better specified way of converting keypresses from monkeys into something with well defined structure...


Honestly, I thought JSON was already very human readable/writable.


JSON is very readable/writable... to programmers. Most humans aren't programmers though.

If you showed JSON to someone on the street they could probably understand the gist of it (if pretty printed). Good luck asking them to write it.


Your argument is a layman can't puzzle out JSON easily, but do you think hjson is any better? Syntactically it's core seems to just be QOL improvements over writing JSON by hand. It isn't any more intuitive from my perspective. In fact, in many cases it seems less intuitive by offering a greater number of ways to do the same thing.


I agree completely. If you're going for human, you need to go a lot further (which is what I've tried to do with HONEY [1]).

[1]: https://github.com/honey/honey


I'll agree with that, but what is the use case for giving this to non-developers? Everything on this page looks like it's directed towards making JSON files slightly simpler for programmers.


I was referring to your comment about JSON in general. See my earlier comment [1] for more background.

[1]: https://news.ycombinator.com/item?id=11501332


"Trailing commas are ignored." This is the most important :p


Hi! I like what you've made. I have been working on something similar, although the Github is massively out of date and was never complete to begin with: https://github.com/narfanator/YAMLite (Also, I'm renaming it nowish on the up-to-date version).

This parser handles YAML, JSON and XML. Interestingly, many of the features HJSON has, this has, by virtue of it being easier to implement during the parsing stage.

The part I'd draw your attention to - and the part that I think warrants the most discussion - is the resulting data structure. I mostly can't tell what the structure is of the HJSON C# object - it looks like it does most of what I wanted to change about the existing C# JSON parsers, but maybe not all?


This feels like a project that had a core idea that was good and justifiable (we'll take some of the common JSON mistakes such as extra commas and most asked for features sub as comments) and then felt the need to keep throwing in features to justify its existence, and now it's lost sight of its original goal.

This can't even be parsed natively by major JavaScript implementations, so is it really JSON at all? Actually, I think that's the root of my complaints, that it's associating itself with JSON while clearly diverging from what was important originally in JSON. At this point it's just some incompatible format leveraging the JSON name. I think most my criticisms would be ameliorated if it was just some other JSON-similar format with a different name.


This looks pretty great.

We've been looking for a replacement configuration format over our ancient ini files and had rejected JSON for TOML because TOML allows comments (and man, can comments be useful in configuration files). This looks like a nice medium-long term alternative.


Congratulations, you've created something YAML-like, non-safe data structure.


I don't understand. doesn't commenting ruin the whole point of a human readable format? If you have to add comments, it means you need to communicate something that can be done in more concise way.


what?

comments are the most important factor of 'human-readability' by far. without them you can't e.g. explain what a particular key does, what is it's default value (if any), or perhaps the most important thing - you can't even put a link to the documation!


what does 'human-readability' mean? JSON is data-transmission format that is human readable.. which is why it happens to be more popular than XML (or SOAP) because you can read the data and see the data.

It's not meant to transmit context (i.e. it's useless and that's what documentation is for).


Human-readable json would've been a great idea, if it wasn't for the fact that json was NOT MEANT TO BE HUMAN READABLE.

It is meant to be generated by a machine and not created by hand, neither it should be readable by humans, only parse-able by a computer.

Treating your data interchange/serialization/configuration/markup formats as languages that should be human readable/writable is a cardinal sin of any person or company that engages in such practices.


Just wanted to bring libucl into the game in case you are exploring json-like syntax: https://github.com/vstakhov/libucl#improvements-to-the-json-...

In my eyes pretty much the perfect configuration library and syntax. Nginx-alike, number suffixes (1min, 2gb, ..), macros, variables, includes with priority, etc. Boom! Problem solved.


And i should remember SDLang that now have a few years and reference implementations on Java, C# and Dlang . I don't see why we need reinvent the wheel again and again...

https://github.com/Abscissa/SDLang-D/wiki/Language-Guide


> Significant whitespace is a common source of mistakes that we shouldn't have to deal with.

...except you just made it significant.

This is JSON.

    {"a":1,"b":2,"c":3}
This is Hjson after applying the mods:

    {a:1b:2c:3}
....oh, it turns out Hjson actually does have significant whitespace.


Where is this drive to create dumb languages coming from? I dont mean dumb in the opinionated way, i mean, dumb in the way of "its hard to write proper code that follows rules, quoting and commas are hard" ... ... ... if quotes, commas and escaping is hard for you, you dont need to be an engineer....


This looks cool, but their characterization of YAML is disingenuous

    YAML expresses structure through whitespace. Significant
    whitespace is a common source of mistakes that we
    shouldn't have to deal with.
since every code editor ever used will take care of this for you.


Nice idea. A nit: making trailing whitespace significant (rather than stripping it) seems like a bug.


The Jaunt JSON parser (Java) was my solution to this problem. It can handle arbitrarily dirty data, including missing quotes or using semicolons instead of quotes, missing quotes, etc.

http://jaunt-api.com


Grammars people! If you don't provide it / promote it, I assume your "nice simple thing" is neither obvious nor thought-through. Sorry.

Help me with my list of trendy things that took or are taking way to long to get a grammar: docopt, semver...



Comments, readability plus typabiliy where one of the main reasons I recently chose YAML for a configuration file. It seems YAML is a bit unloved these days, perhaps because it is more difficult to parse fully.

YAML references also proved useful in my use case.


This is overkill, but someone PLEASE add comments to the next iteration of the json spec.


There is also libucl[1] which is kind of json like. FreeBSD is apparently starting to use of it for a few things.

[1]: https://github.com/vstakhov/libucl


I thought it looked cool when they said "nginx-like". But then I remembered that nginx has annoying string semantics and converts things to arrays in odd circumstances.


Off topic, but related: I made a simple data-serialisation file format based on S-expressions, which may be of interest to some: http://loonfile.info/


HONEY was my take at the same problem. Still brainstorming on this one and not used in production yet.

https://github.com/honey/honey


I'm not using a nonstandard JSON extension unless it implements a standard freaking date format. I also don't think bare strings are necessarily a great idea since they lose implicit type information.


Here's a previous HN discussion on Hjson - https://news.ycombinator.com/item?id=8432678


So it's YAML with all the cruft of JSON thrown back in?


Nope, just a readable JSON. YAML does much more, for many too much.


It's almost as if you're reinventing protobufs. XD


Is "undefined" part of json? Imho it should be.


If the key is not present, the value is "undefined" in JS. "null" is a supported value in JSON, to explicitly mark something as nonexistent.

For me the biggest problem with JSON is lack of full floating point support, i.e. NaN, +-Infinity, -0.


You can stringify an object containing a property with an undefined value. Imho, parsing that back should give exactly the same data structure.


Soon: Situation: there are 15 competing standards.[1]

[1]: https://xkcd.com/927/


Also in this space, Jsonnet is well worth a look http://jsonnet.org/


+1

jsonnet gives all the benefits of hjson - but also provides more powerful templating features


YAML's problem is not "significant whitespace" which really isn't a major cause of mistakes.


so in this language these all do the same thing:

abc "abc" abc, "abc",

How does increasing the scope simplify things? Defining correct as "crashing less often" is a really bad idea, data formats _should_ be strict.


Possible bug: duplicate keys are not flagged as errors (in the demo atleast)


This looks a lot like like CSON but with better comment support.


> Let's make quotes for strings optional as well.

This will end in tears.


"less mistakes" -> "fewer mistakes"


Seems like it would be slower to deserialize.


JSON for humans is called yaml.


I'd rather just use YAML.


YAML is insecure by default, JSON secure by default.

There are not many secure transportation formats, JSON being one of the few. And JSON can be parsed much faster and easier than YAML, with its types, cyclic references and classes.


No include :-(


Is there an include functionality I missed, or do people think config file shouldn't have includes?


    > , less mistakes, 
Does someone have a rather subtle sense of humour, or is that a genuine mistake?! (fewer*)


Worst name ever. I'm not even joking... Perhaps it would be less offensive if the last three letters didn't spell the word "SON."

This seems pedantic, I agree, but thus is the world we live in...

Might I suggest "Human Readable JSON."


I must be having a stupid day - what is offensive about this name?


I think it must be playing off the complaints about the Brotli compression algorithm using "bro" as a file extension. "Bro" and in this case "son" implying some sort of patriarchal dominance over computing.

In short: a joke.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: