TOML: Tom's Obvious Minimal Language

SPBS · on May 21, 2023

JSON is basically perfect if it allowed trailing commas and comments. TOML is not a replacement for JSON because of how badly it chokes on nested lists of objects (being both hard to read and hard to write), due to a misguided attempt to avoid becoming JSON-like[1].

[1] https://github.com/toml-lang/toml/issues/516

arp242 · on May 21, 2023

That's already fixed; in the upcoming TOML 1.1 you can write:

  tbl = {
     hello = "world",
  }

All the examples in the issue you linked should work.

https://github.com/toml-lang/toml/pull/904

bdudbdjsh · on May 21, 2023

published data standards having version is so wrong.

now if you see "config in yaml" you know nothing, zero, nada, about the format because all versions are so different and everyone implemented the version at the time and didn't bother to mention version. not to mention you can use a dozen syntaxes for yaml/toml and each application may not understand them all.

all this is so silly. we will stick with json and informal-ini forever one way or another.

arp242 · on May 21, 2023

It's not ideal, I agree, but it solves real problems for people, so there's that.

Many commonly-used standards today weren't created by a bunch of wise men in a room thinking how to bestow their wisdom upon the rest of us, they often originated in real-world applications, were refined over a period of time based pn real-world experience, and then became a standard.

TOML is the same. I hope it will become an RFC some day. We just need to fix a few outstanding issues first.

Most commonly used TOML parsers support 1.0; adding 1.1 support should be pretty easy as the changes aren't that large (I did it in the parser I maintain, and it's 10 lines of code or so that had to be changed, most of them quite trivial).

Marazan · on May 21, 2023

My opinion is that having a version is fine buy only if a version field is built into the spec.

Without that your config file is a boobytrap, with it everything is fine and your parsing libraries can even be backwards compatible.

eviks · on May 23, 2023

Since it's impossible to design anything great in tech on first attempt, what is the road to improvement without versions?

(and mentioning version could be a requirement in a future version)

ruuda · on May 21, 2023

And if you have to version it, Kelvin versioning is more appropriate for standards.

IshKebab · on May 21, 2023

How do you know which version you are using though?

arp242 · on May 21, 2023

You don't; you check the library or application's documentation.

TOML went through some substantial changes in the past with 0.4, 0.5, and 1.0. As I mentioned in my other comment[1], it's not ideal, but it is what it is.

I wouldn't be surprised if 1.1 would be the last version. Maybe there will be a 1.2 to clarify some things, but I wouldn't expect any further major changes.

[1]: https://news.ycombinator.com/item?id=36020654

IshKebab · on May 21, 2023

> You don't

Great. Very obvious.

josephg · on May 21, 2023

As much as it pains me to say so, this is probably fine for configuration languages so long as they’re backwards compatible. Eg toml is used by rust’s cargo tool. Cargo can just say “hey Cargo.toml is parsed in toml version 1.1 format”.

IshKebab · on May 22, 2023

How does your IDE and linter learn that?

In fairness it's probably not too bad as long as everyone actually migrates to the newest version eventually... But that isn't guaranteed - look at YAML. Or even JSONC. VSCode has a hard-coded lists of which `.json` files are actually JSONC. Gross.

afiori · on May 22, 2023

YAML is way more complex and JSONC is not JSON 1.1

SPBS · on May 21, 2023

The is great news, thank you for your work on this.

ohazi · on May 21, 2023

> JSON is basically perfect

Until you realize you can't actually store real integers because every number in js is a float...

Waterluvian · on May 21, 2023

JSON’s numbers are not IEEE-754. They’re numbers with an optionally infinite number of decimal places. It’s up to a parser to handle it. Python can parse these into integers if there isn’t a decimal place.

It’s in the name, but be careful not to get confused with JSON being JavaScript.

dijit · on May 21, 2023

You wrote this as if it’s a defense but honestly I feel even more terrified of JSON numbers now than I was before entering this thread, and before reading your comment.

Not following a set standard is undefined behaviour, leaving it up to the implementation is a large problem in other areas of computer science. Such as C compilers.

crdrost · on May 21, 2023

Yes but this is a necessary limitation for all human readable numbers. The context decides what to deserialize into and different contexts/languages will choose bigint vs i64 vs u64 vs i32 vs double vs quad vs float, whatever is convenient for them.

Heck, some of them will even choose different endian-ness and sometimes it will matter.

I still remember the first time I dealt with a Java developer who was trying to send us a 64-bit ID and trying to explain to him that JavaScript only has 52-bit integers and how his eyes widened in such earnest disbelief that anybody would ever accept something so ridiculous. (The top bits were not discardable, they redundantly differentiated between environments that the objects lived in... so all of our dev testing had been fine because the top bits were zero for the dev server in Europe but then you put us on this cluster in your Canadian datacenter and now the top bits are not all zero. Something like a shard of the database or so.) We have bigints now but JSON.parse() can't ever ever support 'em! "Please, it's an ID, why are you even sending it as a number anyway, just make it a string." But they had other customers who they didn't want to break. It was an early powerful argument for UUIDs, hah!

Waterluvian · on May 21, 2023

It also means you can use JSON for incredibly high precision cases by making your parser parse them into a Decimal format. You couldn’t do this if you specified these limitations into the language.

Edit: Omg that story. Eep. I guess if someone provided too-large numbers in a JSON format, you could use a custom parser to accept them as strings or bigints. Still, that must have not been a fun time.

crdrost · on May 21, 2023

Yeah I believe I hand patched Crockford’s json2 parser? It was something like that.

Waterluvian · on May 21, 2023

JSON isn’t intended to narrow all details. That’s up to the producer and consumer. If you use JSON you will specify these details in your API. JSON isn’t an API.

I wonder how many times this gets violated though, and how many times this “I dunno… you decide” approach causes problems.

magicalhippo · on May 21, 2023

If you want something stricter, specify it in the JSON Schema and use that[1].

You could declare your own "int32" type[2] for example, and use that. Then validate the input JSON against the schema before parsing it further.

[1]: https://datatracker.ietf.org/doc/html/draft-bhutton-json-sch...

[2]: https://json-schema.org/draft/2020-12/json-schema-core.html#...

monsieurbanana · on May 21, 2023

You could invent a language that represents data that is very explicit about having integers, the implementation in javascript would still spit out floating values, because that's all the language has.

So either you don't target javascript (which would be a bit silly in the case of JSON), or you go the other way and forbid integers, even in languages that do support them. Which is also kind of silly.

Ultimately the real issue is that javascript doesn't have integers and if you're interacting with it, you need to be aware of that, JSON or not.

nly · on May 23, 2023

Doesn't matter.

The baseline is anything written in C and C++, which don't have bignum or decimal types and so more or less always parse JSON numbers to either int64 or double, at best.

gliptic · on May 21, 2023

JSON allows you to store arbitrarily large integers/floats. It's only in JS this is a problem, not if you use JSON in languages that support larger (than 54-bit) integers.

the_gipsy · on May 21, 2023

That's the freedom of unspecified behavior.

marcosdumay · on May 21, 2023

As long as the same person is on both sides of a communication channel, he has total freedom on what to say and will understand it flawlessly!

That's what standards are for, isn't it?

no_wizard · on May 21, 2023

Annoyingly, it also doesn't support BigInt, which would alleviate this problem in JS as well

Simran-B · on May 21, 2023

A number in JSON can have an arbitrary number of digits, i.e. it can represent any BigInt value.

Supermancho · on May 21, 2023

> A number in JSON can have an arbitrary number of digits, i.e. it can represent any BigInt value.

In my experience, violating type constraints causes problems in downstream systems (usually with parsing or trying to operate on invalid values).

Number, as defined by the JSON Schema spec. A 32-bit signed integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647

BigInt is defined by various (MSFT, MySQL, etc): -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Most systems use a JSON String for large numbers, out of necessity, not JSON Number.

ghusbands · on May 22, 2023

In context, BigInt refers to arbitrary precision integers [1], rather than any particular size of integer, hence "arbitrary number of digits".

[1] https://v8.dev/features/bigint

Kwpolska · on May 21, 2023

If both sides are using a language with integer types, this is a non-issue. JSON does not prescribe the number types in use, so the implementations may just say that the field contains 64-bit integers, and just parse them to and from the usual int64 type of their language. It is also legal for JSON parsers to parse numeric literals into an arbitrary-precision decimal type instead of IEEE 574 floats.

josephcsible · on May 21, 2023

You can store arbitrary precision numbers in JSON. The spec explicitly doesn't lock you into floats or any other specific number format.

Spivak · on May 21, 2023

Only if you are both sides of the transmission. If you're sending JSON to code you didn't write you will eventually get bitten by software lossy re-encoding. Lots of places use strings for this reason.

It's like API's that mess up the semantics of PUT/GET so implementing idempotency is extra annoying.

somewhereoutth · on May 21, 2023

But JSON is strings. So 123.4 is essentially "123.4" but with the indication that it is supposed to be semantically a numerical value.

josephg · on May 21, 2023

Right. I want to see an indication of what sort of numerical value it is. Big integers interpreted as floats lose precision. And floats decoded as integers truncate anything after the decimal place. JSON makes it way too easy to get this stuff wrong when decoding.

somewhereoutth · on May 21, 2023

If it has a decimal point then it is a decimal. And if it doesn't (or if it only has zeros after the point) then it's an integer. JSON is absolutely unambiguous as to the actual numerical value - how badly that gets translated into the decoding language is entirely on that language.

josephg · on May 22, 2023

This isn't right. JSON can also store exponential numbers (eg {"google": 1e+100}). You could decode this to an arbitrary-sized BigInt, but I can make you waste an arbitrary number of bytes in RAM if you do that. And even then, "look for a decimal point" doesn't give you enough information to tell whether the number is an integer. Eg, 1.1e+100 is an integer, and 1e-100 is not an integer.

One of JSON's biggest benefits is that you don't need to know the shape of the data when you parse. JSON's syntax tells you the type of all of its fields. Unfortunately, that stops being true with numbers as soon as double precision float isn't appropriate. If you use more digits in a JSON number, you can't decode your JSON without knowing what precision you need to decode your data.

Even javascript has this problem if you need BigInts, since there's no obvious or easy way to decode a bigint from JSON without losing precision. In the wild, I've seen bigints awkwardly embedded in a JSON string. Gross.

Putting responsibility for knowing the number precision into the language you're using to decode JSON misses the point. Everywhere else, JSON tells you the type of your data as you decode, without needing a schema. Requiring a schema for numbers is a bad design.

somewhereoutth · on May 22, 2023

Ok so it allows e notation, but still the actual numerical value is unambiguous. You could parse that into a data structure that (for example) stores the mantissa and exponent as (arbitrarily large) integers. Again, that most languages try to shoehorn decimals into floats or whatever is on those languages.

Spivak · on June 4, 2023

In court you would be right, in practice it's on JSON. Requiring an arbitrary precision math library to correctly parse JSON is just not going to happen. The only language I know that even does this out of the box is Python with their automagic best numeric type. Even Ruby which is dynamic to a fault only gives arbitrary precision for integers and parses JSON numbers with decimals as floats.

josephg · on May 21, 2023

True. But the spec also doesn’t provide a way to tell if a stored number should be decoded as a float or an integer - which makes it a right pain to use correctly in most programming languages. I’d love it if json natively supported:

- Separate int / float types

- A binary blob type

- Dates

- Maps with non string keys.

Even javascript supports all this stuff now at a language level; it’s just JSON that hasn’t caught up.

bccdee · on May 21, 2023

Eh, I'm happy with json as being explicitly unicode. Blobs can be base64-encoded. And date parsing invites weird timezone stuff—I'm happy to stick dates in strings and let the programmer handle that. I suppose a good json parser could insist number literals are ints unless you append a ".0", obviating the need for explicit integers, but that feels a bit kludgey. And I agree about the numeric map keys.

__s · on May 21, 2023

Except JSON can't even serialize all js numbers when it comes to NaN or infinity

data-ottawa · on May 21, 2023

You can store the first 2^53 integers with either sign, and if you need accurate integer values beyond that size you can stringify them and parse as big ints.

It’s not ideal, but 2^64 integers is also finite.

mst · on May 21, 2023

53 bit integers should be enough for anyone

(in practice for config it usually is but enforcing it is horribly patchy)

throwaway894345 · on May 21, 2023

> Until you realize you can't actually store real integers because every number in js is a float

JSON !== JS

KronisLV · on May 21, 2023

> JSON is basically perfect if it allowed trailing commas and comments.

I agree, especially in regards to the comments, because sometimes the data itself isn't enough and additional human-readable context can be really useful!

In that regard, JSON5 is a wonderful idea, even if sadly it isn't widespread: https://json5.org/

It also supports the trailing commas and overall just feels like what JSON should be, to make it better without overcomplicating it.

nikeee · on May 21, 2023

I like the idea of JSON5, but it allows a bit too much in my opinion. For example, why add the single quote? Why hexadecimal identifiers?

evntdrvn · on May 21, 2023

You might be interested to check out JSONC and JWCC/HuJSON

https://nigeltao.github.io/blog/2021/json-with-commas-commen... https://github.com/tailscale/hujson

veidr · on May 21, 2023

JSON is trash and should never be used in any human-interfacing context, so I'm super skeptical that there's any utility in trying to fix it; that would just delay its demise, to the detriment of humankind.

But if you did want to fix JSON, the yes, trailing commas and comments are the absolute minimum bar, but single quote is actually probably the the third absolute must-fix.

The reason is just that so much JavaScript tooling is now configured to autoformat code (including the JSON bits) to swap " to ', thanks in large part to Prettier (which also should almost never be used, sigh, but that's a topic for another HN bikeshed...)

hexadecimal identifiers, yeah, nah

NoahKAndrews · on May 21, 2023

Prettier defaults to double quotes.

I'm curious what you have against Prettier.

chubot · on May 21, 2023

I kinda agree, but I also think then you want unquoted keys

   {name: "value"}

And then maybe 123_456 syntax

It's a slippery slope ... pretty soon it's hard to write a JSON parser, and there are more bugs

The trailing comma one is trivial, I'll grant that

marcosdumay · on May 21, 2023

Trailing commas a very highly impactful to the users, and trivial to the language writers.

The 123_456 syntax comes close to that, but is much less impactful.

None of those will make the language hard to parse.

Spivak · on May 21, 2023

As much as I think it's annoying I think no trailing commas enforces good coding hygiene and basically forces you to use a real encoder rather than as hoc string manipulations.

rand_flip_bit · on May 21, 2023

This is such a bad take

Spivak · on May 21, 2023

Care to like, elaborate? Stopping people from doing brittle stuff like

    print("{")
    for elem in list:
        print('"key": "value",')
    print("}")

seems like immediate worth.

alwaysbeconsing · on May 21, 2023

That's not a very big hurdle to overcome, though:

    print("{")
    contents = ""
    for elem in list:
      contents += '"key":  "value",'
    print(contents[:-1])
    print("}")

marcosdumay · on May 21, 2023

You are talking about data transfer encoding. Configuration languages are a completely different beast.

hellcow · on May 21, 2023

Multi-line strings are another weakness. Sometimes you don’t\nwant\nto\nwrite\nthis\nway.

camgunz · on May 21, 2023

I thought the triple quotes let you avoid that, but I assumed, I haven't checked.

arp242 · on May 21, 2023

Yes, that's correct, but I think the previous poster was talking about JSON.

3np · on May 21, 2023

Are you talking about https://github.com/toml-lang/toml/pull/904 which is merged for 1.1, or something else?

bdhcuidbebe · on May 21, 2023

The poroblem now becomes incompatibility about what a toml document is.

burntsushi · on May 21, 2023

Good thing TOML was never intended to be a JSON replacement.

s3v · on May 21, 2023

Everything is perfect except for the reasons it isn't.

musicale · on May 22, 2023

I kind of like the text property list format from NeXTSTEP, used in GNUstep and (formerly) in macOS.

Apple's XML plist format seems like a mistake, though maybe the newer JSON format is OK.

> JSON is basically perfect if it allowed trailing commas and comments

Apparently Apple actually supports JSON5, an extended JSON based on ES5 that allows trailing commas, comments, and (my favorite!) unquoted key names, among other convenient features.

pydry · on May 21, 2023

I think this was partly what the pytoml author was alluding to when he slated the format:

https://github.com/avakar/pytoml/issues/15#issuecomment-2177...

Datetimes were clearly a mistake to include too. It took the M out of TOML.

ojosilva · on May 22, 2023

I find that YAML can be, if stripped down to the nice parts, a delightful config file structure.

Here's the toml.io example in perfectly legal YAML but using only YAML's nicer, JSON-like, compatibility grammar:

    title: "TOML Example"

    owner: {
        name: "Tom Preston-Werner",
        dob: 1979-05-27 07:32:00 -08:00
    }

    database: {
        enabled: true,
        ports: [8000, 8001, 8002],
        data: [
          ["delta", "phi"],
          [3.14]
        ],
        temp_targets: { cpu: 79.5, case: 72.0 }
    }

    servers: {
        alpha: { ip: "10.0.0.1", role: "frontend" },
        beta: { ip: "10.0.0.2", role: "backend" },
    }

If I were on the YAML board (?) I would push this or a similar subset (JAML? DUML? DUMBL?) to be implemented by parsers in every language: yaml.parse(yamlString, { jamlMode: true }). But it already works today anyway if you stick to the format. And that's what I use for my apps.

Multi-line strings in YAML are also very similar to TOML and you can ignore all different character mixups and stick to '|' (w/ newlines) and '"' (no newlines).

    # same as " hello world! "
    str1: "
        hello
        world!
      "

    # same as "apples\noranges\n"
    str2: |
      apples
      oranges

Most numeric TOML examples work too, minus a few bells and whistles like numeric separators '_':

    # integers
    int1: +99
    int2: 42
    int3: 0
    int4: -17

    # hexadecimal with prefix `0x`
    hex1: 0xDEADBEEF
    hex2: 0xdeadbeef

    # octal with prefix `0o`
    oct1: 0o01234567
    oct2: 0o755


    # fractional
    float1: +1.0
    float2: 3.1415
    float3: -0.01

    # exponent
    float4: 5e+22
    float5: 1e06
    float6: -2E-2

    # both
    float7: 6.626e-34

    # infinity
    infinite1: .inf # positive infinity
    infinite2: +.inf # positive infinity
    infinite3: -.inf # negative infinity

    # not a number
    not1: .nan

bccdee · on May 21, 2023

It kills me just how close JSON was to getting it right. If JSON had been JSON5 [1] instead, YAML and TOML probably wouldn't exist.

[1]: https://json5.org/

atoav · on May 22, 2023

I think JSON syntax is more prone to user syntax errors. And we are talking about syntax errors by the kind of user that neither knows what a "syntax error" nor "JSON" is.

Hence the "O" in "TOML" ("Obvious"). And this is the use case for TOML, simple user facing configuration that they are very likely to just get right.

JSON is fine for more intricate data structures or very complex configuration, but if you just need them to enter a few numbers and booleans it is overkill.

miki123211 · on May 21, 2023

Maybe you could even skip the commas entirely (or threat newlines as commas if possible?) That, along with unquoted keys, would make JSON perfect for me.

In such a hypothetical format, eliminating nulls entirely should Also be considered, the difference between a missing key (undefined) and a null value is significant in JS, but other (particularly statically-typed) languages struggle with differentiating those two cases, and this leads to `serialize(deserialize(a))` representing a different document than `a`.

wewxjfq · on May 21, 2023

The problem with comments in configuration files is that they don't survive a `load -> native object -> save` round-trip.

jen20 · on May 21, 2023

That sounds more like a problem with deserializing configuration instead of parsing.

riffraff · on May 21, 2023

Should a configuration language be a serialization language?

I'm unconvinced.

ilyt · on May 21, 2023

They could if you parsed it into syntax tree wit some methods to access keys instead of parsing into native struct. I think I saw YAML parser doing it...

rwmj · on May 21, 2023

Augeas solves this for many config file formats: http://augeas.net/

ithkuil · on May 21, 2023

I played around with "format preserving" edits of a few text formats, including nested formats (a JSON inside a TOML inside a YAML etc)

https://github.com/mkmik/knot8

peteradio · on May 21, 2023

Comments lie, I think it would make sense to destroy them in load to native.

r3trohack3r · on May 21, 2023

If you parse JSON as YAML you can add comments!

YAML is a proper superset of JSON.

slondr · on May 21, 2023

Not really. https://stackoverflow.com/questions/21584985/what-valid-json...

wkdneidbwf · on May 21, 2023

haha, this exact thing is my biggest gripe with toml

edit: just found https://github.com/toml-lang/toml/discussions/915#discussion..., super happy to be able to rest this out early

rwmj · on May 21, 2023

And if you fixed numbers so you could use 64 bit integers.

eviks · on May 23, 2023

still won't be prefect, all those extra quotes are bad for a heavily used human readable configs format

conaclos · on May 21, 2023

I pretty like the idea of a superset of JSON that supports (1) comments, (2) trailing commas, (3) unquoted properties, (4) optional {} for the root object, (5) multi-line strings, (6) number separator.

JSON6 proposal [1] supports all of this, except (4). Unfortunately it also supports more and make the spec a bit too complex for my taste (JSON has a compact spec; any extension should honor this). Same issue with JSON5 proposal [2].

EKON [3] is certainly the best candidate. However, it didn't get any traction.

[1] https://github.com/d3x0r/JSON6 [2] https://json5.org/ [3] https://github.com/ekon-org/ekon

chrismorgan · on May 21, 2023

I strongly dislike the idea of optional {} on the root object. It’s weird special-casing that adds complexity (code and cognitive) for no adequate reason, destroying the neat contextless recursion of parsing. Remember also that objects aren’t the only valid JSON values; and why should objects be privileged over, say, arrays? You could make [] optional too without introducing actual grammatical ambiguity, other than deciding what to do about the empty string (which “optional {} on the root object” would make valid, unless you special-case it further). But all of these cases require that humans and computers alike look ahead, rather than seeing { and immediately knowing what they’re dealing with.

elondaits · on May 22, 2023

Arrays at the root of JSON are dangerous and best avoided:

https://stackoverflow.com/questions/3503102/what-are-top-lev...

chrismorgan · on May 22, 2023

The matter of literals being interceptable due to using the current value of globals like Array was fixed across the board over a decade ago. You don’t need to worry about it in the slightest.

(Exploits also depended on a form of cross-site request forgery that (a) has been well-understood and avoided for fifteen years now (and with a perfect solution available for five years via the SameSite cookie attribute), so if you’re affected you very probably messed up in other exploitable ways too, and (b) is often even protected by default now: Chromium switched the default to SameSite=Lax in early 2020, so the sensitive cookie would need to be explicitly set with SameSite=None in order to be vulnerable at all. Safari and Firefox haven’t yet shipped this behaviour, though they all agree they want to, since it does break some older sites.)

tadfisher · on May 22, 2023

In the decade (or more?) since that was a problem, ES added JSON.stringify(). Nobody runs eval() to parse JSON anymore, and moreover, the root cause of the exploit (CSRF) has been addressed with CORS and sane default browser policy.

chrismorgan · on May 22, 2023

You’re missing some of the nuance of the bug.

CORS only stops you from fetching and reading resources, not from evaluating JavaScript with <script src="…">. (Fun related topic: JSONP.) This vulnerability depended upon the fact that a JSON array literal is a valid JavaScript program that just does nothing (unlike a non-empty object literal, since its opening { will be treated as a block, and so `{"key":` triggers a syntax error). Thus, you could use <script src="/path/to/json/array"></script> and it’d run just fine, doing nothing—except that back then you could intercept array and object creation. That was the crux of the vulnerability, and that was fixed in ECMAScript 5.

As for sane default browser policy: the only policy that has changed here is SameSite=None → SameSite=Lax on cookies, and that has actually still only shipped in Chromium-family browsers.

codeflo · on May 21, 2023

JSON as a computer-generated serialization format doesn't need any of those features. But for human-written configuration files, at the very least trailing commas and comments are basically a necessity. The question then is which features to pick beyond that (in my humble opinion, none), and how such a new standard would gain traction. JSON5, JSON6, JSONC all basically attempt to solve the latter problem by branding, but so far, mostly have failed.

It doesn't help that there's an insane amount of bike-shedding going on in this space. The people who have forked JSON5 to create JSON6 for example have created an endless amount of confusion over which standard to use, harming both of them. I think they might want to think really hard whether the inclusion of octal literals was really worth the hostile fork.

atombender · on May 21, 2023

I like Google's Jsonnet [1], which has all of this except for 4.

Jsonnet is quite mature, with fairly wide language adoption, and has the benefit of supporting expressions, including conditionals, arithmetic, as well as being able to define reusable blocks inside function definitions or external files.

It's not suitable as a serialization format, but great for config. It's popular in some circles, but I'm sad that it has not reached wider adoption.

[1] https://jsonnet.org/

jrockway · on May 21, 2023

Yeah, I like jsonnet a lot for config files.

ilyt · on May 21, 2023

>I pretty like the idea of a superset of JSON that supports (1) comments, (2) trailing commas, (3) unquoted properties, (4) optional {} for the root object, (5) multi-line strings, (6) number separator.

You're 90% way there to YAML.

And YAML 1.2 cleaned up most of the annoying YAML edgecases too

4ad · on May 21, 2023

Among types and other useful properties, CUE supports everything you asked for. We're even considering adding a "data-only" mode to CUE which would be exactly what you asked for.

https://cuelang.org

conaclos · on May 21, 2023

Is there any concise representation of the language spec? Similar to the spec of JSON [1].

[1] https://www.json.org/json-en.html

4ad · on May 21, 2023

No, we do have a spec[1], but it cannot be as concise as the JSON link posted because the JSON link is mostly only about syntax, and our spec discusses both syntax and semantics. And of course, we are a much richer language than JSON, with many more features and computational behavior.

That being said, it would be nice to use railroad diagrams to describe syntax in our spec.

[1] https://cuelang.org/docs/references/spec

analog31 · on May 21, 2023

Just to add one more feature request... ;-)

Comments are preserved when you read a file, change some of its contents, and write it back out again.

hnlmorg · on May 21, 2023

I have zero issues if parsers choose to do this but I wouldn’t argue that it shouldn’t be a required part of the specification. If comments need to be preserved then they’re part of the schema and should be a (for example) string field rather than comment.

eviks · on May 23, 2023

What's the connection? They can't be a string field because semantically they're tied to another field they're commenting, and they need to be preserved because that info is valuable

Why do you require conversion in this use case?

himujjal · on May 22, 2023

Hi. I am the author of EKON. I realized YAML was doing too many things. EKON was meant to be an extension of JSON5. Although right now I realise its also doing too many stuffs under the hood.

I should rewrite the project again. But be as simple as possible.

That was my first C project ever during my University. :P

solarkraft · on May 21, 2023

I have a wish I think is superior to trailing commas: Optional commas, like semicolons are optional in JS. Human-readable objects and arrays, like JS instructions, are nearly always already newline-separated.

kayodelycaon · on May 21, 2023

Let me introduce you to our lord and savior: YAML. ;)

hyperhopper · on May 21, 2023

Ah yes, the language where a string can magically become a boolean and cause an error. And you can never be sure what is legal because of so many changes in parsing legality between versions.

Yaml is one of the worst config languages.

rgovostes · on May 21, 2023

Prior to YAML 1.2, unquoted numeric values containing : were interpreted as base 60, a common trip-up for Docker Compose port mappings. The intent was to make it easier to write times, i.e., 2:00 == 120.

FireInsight · on May 21, 2023

YAML is great except for everything wrong with it, YAML is the worst except for all the others, etc., etc.

_ZeD_ · on May 23, 2023

Yaml is plain unreadable for me tho

friend_and_foe · on May 21, 2023

I use TOML and JSON for everything. I like them both quite a bit. TOML gets a lot of criticism for not being JSON and JSON gets a lot of criticism for... not being entirely fit for it's stated purpose. However, they're both fantastic if you use them for what they're good at: TOML for human readable configurations that you expect a user to modify, JSON for representing data that a machine will read all of the time but that you want the user to be able to look at and grasp. I don't think either of them really need to grow to be good at both, they're fine as they are, just pick the right tool for the job.

Simran-B · on May 21, 2023

I don't understand the appeal of TOML. Why not use YAML instead? Seems a lot more "obvious" to read and write to me. And it's the best I know that is strong in both, human and machine readability.

MrJohz · on May 21, 2023

YAML has a lot of extra stuff going on that can cause accidents if you don't take care. The classic example is the "Norway problem" where "no" (the country code for Norway) is parsed as "false" instead. If "no" is used as a key, this can cause the Norwegian data to disappear or to throw strange errors on load. The other big issue is that, by default, it allows relatively unrestricted code execution in many environments. If a user is ever able to submit data that is interpreted by a YAML parser, they can run arbitrary code, and potentially even commands.

The fixes for these are both fairly well known (always quote strings and keys, use safe_load or an equivalent API), but it's very easy to make a mistake.

TOML, by contrast, is much simpler - strings are strings are strings, they must always be quoted but other than that behave pretty much as expected. Similarly, there is no mechanism by which a TOML file can instruct the parser to execute arbitrary code as part of deserialisation, which makes it (a) a lot simpler, and (b) a lot safer.

daveevad · on May 21, 2023

Someone at work recently got bit by unexpected YAML parsing a git commit hash that contained a substring which was a valid number in scientific notation (IIRC 5e38031).

jerpint · on May 21, 2023

AceJohnny2 · on May 21, 2023

YAML is interesting, because at first glance it looks like a pretty convenient, human-readable syntax, it's got lists, dicts/mappings, strings, numbers, booleans... simple enough for 90% of the use-cases, and focusing on "human-readable, right? But look at little deeper and one uncovers some horrors.

In particular, the plethora of boolean values... here let me just grab the regex from the spec [1]:

    y|Y|yes|Yes|YES|n|N|no|No|NO
    |true|True|TRUE|false|False|FALSE
    |on|On|ON|off|Off|OFF

At that point why limit yourself to just the English language?

Then there's the language-specific types or, as YAML calls them, "local tags". The stuff preceded by `!` like

    !!map {
      ? !<tag:yaml.org,2002:str> "foo"
      : !<!bar> "baz"
    }

At that point, you've exceeded the scope of "human-readable".

I'll admit there are some other, er, "complications" that I actually like. In particular, anchors (`&FOO`) and aliases/references (`*FOO`) [2] make it possible to describe arbitrary graphs, and I find it very useful to factorizing blocks. Also, "folding" [3] text, which allows you to spread value/text across multiple lines for readability, while respecting the surrounding indentation, is neat.

[1] https://yaml.org/type/bool.html

[2] https://yaml.org/spec/1.0/#syntax-anchor

[3] https://yaml.org/spec/1.0/#id2566944

Spivak · on May 22, 2023

YAML 1.2 fixed the boolean thing.

Language tags can get weird but they're ultimately just additional types implemented by your yaml parser you should basically always turn off/not turn on unless you know you want it. You can serialize classes in your language without it.

Tags are really good for readability if you use them thoughtfully

    value: !!binary
       base64string

    # ordered map
    value: !!omap
      a: 123
      b: 455

The reason that example is so ugly is it's using the "complex mapping key" syntax which is unbelievably ugly but if you're using objects as keys in your map you abandoned sanity long ago.

Gare · on May 21, 2023

> YAML has a lot of extra stuff going on that can cause accidents if you don't take care. The classic example is the "Norway problem" where "no" (the country code for Norway) is parsed as "false" instead. If "no" is used as a key, this can cause the Norwegian data to disappear or to throw strange errors on load.

This was fixed in YAML 1.2, but the problem is that almost nobody uses (and there is patchy support for) it.

tedivm · on May 21, 2023

The vast, vast majority of issues with YAML can be solved by quoting all strings. Then a lot of the type inference magic goes away.

rayrrr · on May 21, 2023

YAML is not reliably machine-readable, nor was it designed to be. TOML was designed to be machine-readable, but otherwise fulfilling a similar use case as YAML.

danmur · on May 22, 2023

I think YAML was designed to machine readable. What would you do with it otherwise? Nobody's writing poetry in it (that I know of)

rascul · on May 21, 2023

YAML is more complex and difficult to write than TOML.

OtomotO · on May 21, 2023

Yaml is incredibly complex. KISS

polski-g · on May 22, 2023

Yaml has terrible type enforcement. If you have "no" as the value, sometimes it will be a boolean and sometimes a string.

danmur · on May 22, 2023

I assume neither of us has experienced these YAML bites that others have :).

I don't mind TOML but I don't find it either intuitive or obvious. The syntax for how nested things are flattened I just find really hard to read and write. It's fine though, not that big a deal.

pachico · on May 21, 2023

My main issue with TOML is precisely described in the homepage example:

[servers]

[servers.alpha]

is error prone. An sub heading shouldn't need to know about its parent, much less to have to copy it entirely.

pphysch · on May 21, 2023

But then you need new syntax to indicate subheadings, which is also (probably less) error prone because you might omit it.

ryan-duve · on May 22, 2023

I think it is not needed? Or did you mean something other than this?

    Python 3.11.3 (main, May  4 2023, 05:53:32) [GCC 10.2.1 20210110]
    Type 'copyright', 'credits' or 'license' for more information
    IPython 8.13.2 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: !cat /tmp/x.toml
    [servers.alpha]
    x = 10
    
    In [2]: import tomllib
       ...: 
       ...: with open("/tmp/x.toml", "rb") as f:
       ...:     t = tomllib.load(f)
       ...: 
       ...: t
    Out[2]: {'servers': {'alpha': {'x': 10}}}

petre · on May 21, 2023

How do you convert it into a nested object then? You could use deeply nested objects like JSON or YAML does, but then it defeats the purpose of an easy to read configuration format. This is also a feature of Java property files.

mdaniel · on May 21, 2023

> This is also a feature of Java property files.

To the very best of my knowledge, Properties are merely Map<String,String> (see: https://docs.oracle.com/en/java/javase/11/docs/api/java.base... ) and any subsequent interpretation happens in the application (e.g. Spring treating `foo[1]=bar` as a list-ish assignment but the actual key returned from `getProperty` is the string literal as written `foo[1]`)

enriquto · on May 21, 2023

Graybeard opinion: all of toml, yaml, xml, json, kdl, etc., are ridiculously over-engineered for simple configuration files. You can nearly always fulfill all of your needs with a simple two-column text file of key=value pairs that can be parsed by a trivial call to fscanf(3) or whatever your laguage supports (yes, even correctly discarding comments). The world would be a better place if people just gron'd their jsons, xml2'd their xmls and so on, thus pruning the universal dependency tree by a considerable amount.

adgjlsfhk1 · on May 21, 2023

How does this deal with needing to represent nested data? (the example coming to me now is TOML as a list of libraries each of which has a number of dependencies). I guess you can always do it by duplication, but for a human oriented format, that sounds pretty painful.

enriquto · on May 22, 2023

> for a human oriented format, that sounds pretty painful

As a human, I prefer to edit by hand an unordered list of key-value pairs (even if the keys are long), than a json object. I often find myself using gron/ungron to edit json data by hand in a comfortable manner. The fact that several implementations of gron exist suggests that I'm not alone in this preference. Editing json requires a global understanding of the structure. But on a list of keys/values there is no order, and each line stands for itself and is understandable in isolation.

xedrac · on May 22, 2023

With freeform keys, you can nest all day. But you'll likely just reinvent the wheel when you want certain features.

aaa.bbb.enabled = true

aaa.ccc.enabled = true

Rapzid · on May 21, 2023

How do you handle data structures in your key/value world? Parse it out of some adhoc key string schema?

rfoo · on May 21, 2023

--flagfile gets me really far but to be fair it sucks when you need nested kv

enriquto · on May 21, 2023

just use longer keys (with an ad-hoc separator, if you want)

rfoo · on May 22, 2023

What about lists? Something like:

  --allowed-downtime-0-start=20230523170000Z
  --allowed-downtime-0-end=20230523170500Z
  --allowed-downtime-1-start=...

Or

  --allowed-downtime-start='20230523170000Z,...'

I can live with them, but neither looks clean :(

Genuine question, as I actually like the plain old flagfiles/Java-properties-ish style.

enriquto · on May 22, 2023

The output produced by gron looks quite clean to me. You just write your file as if it was a list of js variable assignments.

    allowed-downtime[0][0] = 20230523170000Z
    allowed-downtime[0][1] = 20230523170500Z
    allowed-downtime[1][0] = ...

or if you absolutely want the names, you can make a "list of dictionaries":

    allowed-downtime[0].start = ...

throwawaaarrgh · on May 21, 2023

TOML is, indeed, a human friendly config format. Whereas INI is a simple config format.

YAML/JSON/CSV/XML/etc are not config formats at all. They are data serialization formats. If you don't know the difference, you probably didn't get a CS degree, or read or understand the specs.

Config formats should be tailored to the application, because the entire point of a config file is to make it easy for a human to configure one specific application. That's why there are 100 different config file formats for old UNIX programs. For each one, it's vastly easier to use that one config format for that one application, than it would be to force some arbitrary data serialization format to represent what the human really wanted the program to do.

TOML is better than YAML or INI for a config file, but what's best is rolling your own format. It literally leads to better outcomes. The user can express what they want faster, easier, clearer, and the application doesn't have to struggle with alternate implementations of the spec for some poorly defined data serialization format.

majewsky · on May 21, 2023

Where do you draw the line between config file format and data serialization format? The first paragraph of the TOML spec says:

> TOML is designed to map unambiguously to a hash table.

Sounds like a data serialization format to me. Besides the obvious differences in design and stylistic choices, it only really differs from JSON or YAML in that the top-level object of the data structure must be an object. If this type of constraint is what makes it a config file format, I am going to change the world with my revolutionary new config file format "JSON, but the first byte must be {".

As another evidence for the blurriness of this line: Cargo uses TOML for its config file (Cargo.toml) which matches the usage that you describe. But then it also uses TOML for the serialization of its internal locking data structure in Cargo.lock, which is a file that humans are specifically not told to edit directly because it just contains serialized data. It just so happens to use TOML, probably because they already had a parser for that available at the time.

> TOML is better than YAML or INI for a config file, but what's best is rolling your own format. It literally leads to better outcomes.

Disadvantages of rolling your own format:

- You now have to write your own parser, which costs at least a bit of time and is going to be more prone to bugs than a widely-used deserialization library like Serde. I see that you dunk on data serialization formats being "poorly defined", but if the popular formats are poorly defined, then certainly the average tool developer is not going to do a better job when they actually want to implement the tool and not the parsing for a config file format.

- Users have to learn the format, and have to mentally switch between formats for each application. I just hate how every homegrown config format uses slightly different syntax, esp. which character starts a comment. With `config.yaml` or `config.toml`, I don't have to guess.

- If the config file is YAML or TOML, I immediately have some basic syntax highlighting in my editor.

throwawaaarrgh · on May 22, 2023

> Where do you draw the line between config file format and data serialization format?

It's a somewhat ambiguous concept. But generally speaking, config formats lack features that would be useful for a general purpose, and include features that are useful for a specific purpose. For example, it may lack a data type, but it may have some special characters that denote some string is a regular expression. The main difference being how easy it is for a human to deal with it.

The main purpose of a config format is for a human to tell a program how to operate. It is typically distinct from the actual input data used by an application to create output data. It's like the difference between **argv and stdin.

> You now have to write your own parser, which costs at least a bit of time and is going to be more prone to bugs than a widely-used deserialization library like Serde

Actually I argue the opposite. Parsers aren't compatible between implementations which leads to bugs. Specs aren't well understood by either users or programmers which leads to bugs. A home grown implementation doesn't need to be touched after it's first written, unless you're adding or changing the features of your configuration format.

> Users have to learn the format, and have to mentally switch between formats for each application. I just hate how every homegrown config format uses slightly different syntax, esp. which character starts a comment. With `config.yaml` or `config.toml`, I don't have to guess.

Most users never learn those formats properly, leading to false conclusions like the "Norway problem", which isn't a problem with the spec at all, it's a problem of users never reading the spec and attempting to write it (when YAML was never intended to be human-writeable, only human-readable; read the spec!).

Compare this to a home grown config file which is designed to be easy to write, adds functionality to make advanced behavior easier (without adding an entire Turing-complete DSL), and doesn't overload an existing format with broken and confusing changes (see Ansible's and other bastardized YAML hybrids).

"Simplified config file formats" are often just shitty generic DSLs that still lack functionality to actually make the user's life easier. Roll a custom format and hard things become easier [for the user].

alpaca128 · on May 21, 2023

TOML is nice as a kind of extended .ini format but I don't like the approach to avoiding nested elements.

I prefer KDL. It's like XML but with almost zero syntax. Probably not always the best idea because it's so niche but it's just really simple while having all the basics like comments and multiline strings.

grncdr · on May 21, 2023

KDL (assuming you mean https://kdl.dev/) looks great, thanks for mentioning it.

p0nce · on May 21, 2023

See also: https://sdlang.org/

camgunz · on May 21, 2023

Rather than a JSON replacement, I like TOML as a YAML replacement. It's a lot simpler, I'm not confused by things not having a preceding dash, indentation isn't significant, etc.

IshKebab · on May 21, 2023

I disagree. TOML is an INI replacement. They both suck at nested structures but are good if you just have a load of top level sections containing simple key-values.

YAML on the other hand is just as good at any nesting depth. I mean it's pretty awful at all depths, but the act of nesting doesn't make it any more awful.

matsemann · on May 21, 2023

I spent a whole day debugging an issue because an extra - dash in yaml.

I had something like

  rules:
   - name: Only allow X
   - condition: ip=123

The - in front of condition of course makes a new list element. So I now had 2 rules. One with no conditions and a name, and an unnamed one with conditions. Since they are or'ed together everything for some reason I couldn't figure out went through, as I thought I had a single rule..

I hate yaml.

IshKebab · on May 21, 2023

Yeah I've had this happen to me too. Truly awful format. I think the only reason it has caught on is because it has multiline strings, which are usually not great in other formats. Of course YAML fucks that up too by having a gazillion different kinds of multiline strings, distinguished only by impossible to remember symbols. But it does at least have them.

camgunz · on May 22, 2023

Yeah I guess we're saying the same thing: TOML as an INI replacement is more of a YAML as an INI replacement. Re: nesting, I think TOML's dotting is pretty elegant as it flattens the nesting you so often need in configs. It took me a while to get used to the double brackets (and it is pretty confusing that you can put them anywhere, not just after their parent section--what are the edge cases that benefit from this...) but now I'm bought in, to the point where I don't know if I've just gotten used to the madness or what. I like that those two things essentially flatten the config--if you were using the new multi-line-inline-tables (from the forthcoming 1.1) you could actually nest things but now I don't know how much I even want that.

vander_elst · on May 21, 2023

I wonder why text protobuf is not widely used for config. There is a formal description of the structure and the configuration structure can be statically checked.

skybrian · on May 21, 2023

Maybe because protobufs are specified in a language that’s different from everything else and needs its own parser and codegen? Also, there’s often more of an impedance mismatch with the basic datatypes of scripting languages. For example, people using protobufs will often specify int64 by default, when you don’t actually need 64 bits but need more than 32.

JSON support is often in a language’s standard library, or close, and its ambiguity around what numbers you can actually use can be seen as a worse-is-better approach.

(There are similar tradeoffs with not specifying a max length of a string. It’s awkward, nit-picky, and error-prone to pick a maximum, but there are practical reasons you don’t want to allow unlimited lengths in a UI.)

overgard · on May 21, 2023

Another alternative is HOCON: https://github.com/lightbend/config/blob/master/HOCON.md

I'd describe it as JSON-like with much more flexible syntax and the ability to reference itself.

Waterluvian · on May 21, 2023

Using JSON for configuration was and continues to be a terrible blunder.

TOML fills that role wonderfully. And it isn’t full of complexity that turns Yaml into a mess.

shaunpersad · on May 21, 2023

I've had the complete opposite experience. Neither YAML or TOML are very obvious to me. I always have to double and triple check that my syntax does what I assume it will do. I never have that problem with JSON.

Waterluvian · on May 21, 2023

Not adding comments to a config file is recipe for complete disaster. You can kinda fake it but it’s just a silly headache. Which is recognized given there’s now “JSONC” but now you’ve got even more confusion determining what systems can actually accept comments or not. That’s always fun.

I’m curious what parts of TOML do people find confusing? It’s such a tiny markup language.

eviks · on May 23, 2023

Bad nesting alone would make it a similar blunder for configuration

bruce511 · on May 21, 2023

I'm puzzling over where I would use this sort of thing, over say Json, or simple xml, and I realise that it's mostly useful in cases where you're expected to edit the configuration in a text editor.

That's not something I come across a lot - all my software (that I use) has a visual interface, and so raw editing of text/config files is not something we do.

So context really comes into play here. I can see how this would be very useful for some programs, and near useless for others. And more specifically useful for those who consume user-edited configuration files.

lexicality · on May 21, 2023

Curious at to what technologies you work with? The work that I do and tools that I use have all manner of configuration file syntaxes and languages and a big headache for doing sysadmin work is trying to remember if a particular .rc file uses INI, YAML, JSON, TOML or some home rolled nonsense.

rob · on May 22, 2023

At work, we needed a config file for some employees to sit alongside a custom (non-web) script I wrote. These employees have zero technical knowledge (like, they wouldn't know basic HTML), and I found TOML to be pretty easy for them to understand. It looks like this:

    # config.toml

    [[podcasts]]
    title="Podcast 1"
    export_label="Label 1"

    [[podcasts]]
    title="Podcast 2"
    export_label="Label 2"

It's flat for them, but gives me a nice `$podcasts` array of nested objects to work with and they don't need to worry about curly braces, indenting, etc. They can just copy/paste three lines if they need to "add" a new one.

Of course, my use case is pretty simple and these are simple string values, but I found it pretty nice.

Sujeto · on May 21, 2023

I moved some config files from json to yaml. Maintaining the documentation became a lot easier, because before I had a document describing every config option, which I had to maintain separately, instead now I have comments in the config file itself.

bonzini · on May 21, 2023

Note that now you have "the Norway problem" instead. https://hitchdev.com/strictyaml/why/implicit-typing-removed/

There's a whole spectrum of config formats including JSON, JSON5, strict YAML, YAML. It's quite messy.

riffraff · on May 21, 2023

The norway problem is not a problem if you use yaml 1.2.

Which is 14 years old.

EDIT: this sounded adversarial which was not my intention. Plenty of libraries do not support 1.2 which sucks, I just meant that it's something solved in theory a long time ago.

debugnik · on May 21, 2023

Could you explain further? The link contradicts this:

> The most tragic aspect of this bug, however, is that it is intended behavior according to the YAML 1.2 specification. The real fix requires explicitly disregarding the spec - which is why most YAML parsers have it.

riffraff · on May 22, 2023

the article is wrong, from what I can tell.

From https://yaml.org/spec/1.2.2/ext/changes/

> Changes in version 1.2 (revision 1.2.0) (2009-07-21)

> [...]

> Only true and false strings are parsed as booleans (including True and TRUE); y, yes, on, and their negative counterparts are parsed as strings. Underlines _ cannot be used within numerical values.

detaro · on May 21, 2023

The link is wrong. YAML 1.2 fixed this.

GuB-42 · on May 21, 2023

I found yaml to be surprisingly complex and tricky to write for how simple it looks.

It is made to be beautiful and easy for a human to read, but the trade-off are a lot of implicit rules you need to be aware of.

pydry · on May 21, 2023

The point of strictyaml is to fix that. The trade off is that you need to define a schema up front.

zaebal · on May 21, 2023

It also reminds me the syntax of INI files.

danielvaughn · on May 21, 2023

I just started a new job where we’re writing a lot of Python, and I’m seeing TOML everywhere

gscho · on May 21, 2023

In case anyone is interested I made an online toml syntax checker in the same vane as jsonlint.com and yamllint.com:

https://toml-lint.com

omoikane · on May 21, 2023

Previously (2020-09-10): https://news.ycombinator.com/item?id=24436550

There was also an earlier entry where it was "Tom's Own Markup Language" (2013-02-24):

https://news.ycombinator.com/item?id=5272634

mdswanson · on May 21, 2023

Looks like Microsoft's .INI file format.

IshKebab · on May 21, 2023

Naming it "Obvious" doesn't actually make it true unfortunately. JSON is still by far the most obvious format, and JSON5 fixes all the big issues with using it as a configuration format. I wish it had wider support. It's just so obviously better than TOML and YAML.

nicoburns · on May 21, 2023

If one is looking for an alternative config format, the best designed I've seen is https://github.com/gura-conf/gura

rascul · on May 21, 2023

Gura doesn't look too bad to me at a glance. Not sure if I'll attempt to use it in the future but I'll bookmark it for now.

eviks · on May 23, 2023

Thanks for the tip, pity they mandate 4 spaces, that's 2 too many

the_mitsuhiko · on May 21, 2023

The one thing I wish TOML had added was null. Today many TOML configs still have “nullable” keys which means you need to comment them out to unset them.

lexicality · on May 21, 2023

The Billion Dollar Mistake lives on.

Personally I'm happy it doesn't have explicit null and somewhat surprised it supports NaN (which I thought it didn't)

To my mind, a configuration file using "this key doesn't exist" as one of several valid options seems deeply unsound.

the_mitsuhiko · on May 21, 2023

> The Billion Dollar Mistake lives on.

I disagree. There is a big difference between configuration language formats and programming languages.

> To my mind, a configuration file using "this key doesn't exist" as one of several valid options seems deeply unsound.

I would agree, but in that case you now often need a secondary key to "turn something off" which is often not done.

lexicality · on May 21, 2023

> There is a big difference between configuration language formats and programming languages.

You're right! But in my opinion the difference is that a configuration file should have a default state that could be written to disk and loaded with no change in behaviour.

eg you can set `feature = "on"` or `feature = "off"` and if you don't set either it's the same as writing `feature = "off"`.

Having a secret third thing of `feature = null` should be outlawed and I'm glad TOML doesn't encourage it

the_mitsuhiko · on May 21, 2023

There is a secret third thing called "feature not in file" and most TOML implementation encourage it as "key not in file" is typically returned as `null` or `None`. That's how so many TOML files in practice end up with all these secret values.

It's also particularly odd in lists where I have seen `["foo", "bar", {}]` show up in the real world with `{}` being used as a replacement value for null.

lexicality · on May 21, 2023

In my experience (Python and Rust) missing keys are errors, not silently converted to None:

    In [1]: example = {"foo": None}

    In [2]: print(example["foo"])
    None

    In [3]: print(example["bar"])
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    Cell In[3], line 1
    ----> 1 print(example["bar"])

    KeyError: 'bar'

    In [4]: print(example.get("bar", "sensible default"))
    sensible default

If configuration values were arbitrarily nullable I would either need to deal with the possibility of nulls ending up anywhere in my program (back to The Billion Dollar Mistake) or have to have checks on parsing to say which fields are nullable or not.

I acknowledge people have done silly things regarding missing keys already - but I think they should be discouraged from continuing to do so, not enabled.

rascul · on May 21, 2023

> In my experience (Python and Rust) missing keys are errors, not silently converted to None:

In Rust it's possible to deserialize to an Option.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

donaldstufft · on May 21, 2023

How is checking for null any different than checking for the right type in general in TOML?

arp242 · on May 21, 2023

> There is a big difference between configuration language formats and programming languages.

In general: I agree. But configuration formats (including TOML) also need to map well to programming languages, so you can't seem them as entirely separate. One argument against Null is that it doesn't elegantly map to a data structure in all programming languages.

the_mitsuhiko · on May 21, 2023

> One argument against Null is that it doesn't elegantly map to a data structure in all programming languages.

TOML has plenty of constructs that don't map well to many languages. Plenty of language implementations do not know how to work with the datetime type, some TOML implementations barf on objects in lists. In some TOML implementations you also end up with nulls showing up when certain empty table constructs are used.

arp242 · on May 21, 2023

Making it map to every single language is always going to be hard, but most (mainstream) languages come with support for these kind of things in the language itself or standard library. That's not a good reason to add more though. It also pushes the complexity the complexity to the people using TOML in their programs, rather than the people writing TOML parsers: in some languages like Python or JavaScript you can "just" use a string and it's nullable, but in other languages you have to muck about with pointers, special types, options, etc.

kibwen · on May 21, 2023

> Plenty of language implementations do not know how to work with the datetime type

Right, and IMO datetime was a mistake to include in TOML.

afiori · on May 21, 2023

CSS has two magic values that would be useful in many configuration scenarios:

- unset: ignore values of this property

- initial: use the default value

These would be useful e.g. with overrides.

I agree that nulls in a config files are often not a good idea because it is not clear how they are going to be interpreted.

PS: Css also has inherit that could mean "use the local default" while initial could mean use system default.

lexicality · on May 21, 2023

Unfortunately that's not what `unset` does and I feel like the confusion it causes and the complication it adds to any CSS parser suggests it's a bad idea!

marcosdumay · on May 21, 2023

Still, initial and inherit are good things to have.

They are about as useful in a configuration language, where the cascading logic also applies.

afiori · on May 21, 2023

lol, I guess this might be an argument against null

bazoom42 · on May 21, 2023

> The Billion Dollar Mistake lives on.

If you are referring to Tony Hoare, the “billion dollar mistake” was not the existence of nulls per se, but that all references were nullable, with no support in the type system to distinguish between nullable and non-nullable references.

arp242 · on May 21, 2023

It's been discussed a few times over the years:

https://github.com/toml-lang/toml/issues/921

https://github.com/toml-lang/toml/issues/803

https://github.com/toml-lang/toml/issues/802

https://github.com/toml-lang/toml/issues/146

https://github.com/toml-lang/toml/issues/30

Obviously there is a bit of a demand, and it certainly has valid use cases, but also has downsides.

the_mitsuhiko · on May 21, 2023

I'm aware of the discussions. I also gave up trying to get NULL into TOML and I think I'm reference in one of those issues. Personally I do not create TOML configs where absence of key has a meaning, but I'm also exposed to other people's TOMLs where this is an actual issue with all the consequences this causes.

And that ignores the issue of nulls in arrays which are sometimes needed. For instance I have been exposed to parameterized SQL queries in TOML where you need to use the empty object to represent NULL, a particularly absurd incarnation of this.

arp242 · on May 21, 2023

I'm not saying there aren't reasonable and valid use cases, there obviously are. But there are also downsides, so the question is how it all balances out. Almost every feature that has ever been added to anything has reasonable and valid use cases, but clearly not everything should have all the features.

Guthur · on May 21, 2023

Well that would be null right? an absence of the thing.

TylerE · on May 21, 2023

Null is one way to represent such a value, but it’s hardly the only.

Option-style algebraic types are another, and more or less strictly superior if working in a language with first class support for such, because they force you to explicitly handle the null/None/Empty/whatever case if you’re doing something with it that could fail (basically anything besides assignment or passing it along).

Guthur · on May 21, 2023

Seems beyond the simplicity TOML was aiming for. Absence is absence in the file, in your programming language you can wrap in whatever trickier you like or is relevant.

TylerE · on May 22, 2023

The trick is distinguishing empty string vs no value vs key not being present.

Guthur · on May 22, 2023

Key not being present is no value, key present has value, empty string or otherwise, seems simple enough.

You can add advanced semantics on load, the file does nothing by itself after all.

TylerE · on May 22, 2023

It’s that “or” that bites you if you aren’t real sure of the semantics in play.

Guthur · on May 22, 2023

There is no 'or' for a given file, it is declaratively there or it is not. The 'or' semantics are part of your language environment not the toml file.