Bah. While YAML is far from perfect, it's fine for random config files written by humans. TOML is mostly better, but from its own homepage (https://toml.io/en/):
[servers]
[servers.alpha]
ip = "10.0.0.1"
role = "frontend"
[servers.beta]
ip = "10.0.0.2"
role = "backend"
is ugly to my eyes. And I'd rather swallow my own tongue than have to hand-edit XML in the most common case where there's not a dedicated editor for that specific doctype.
In fact, I'd echo the linked article's argument back: I don't know of a case where XML is the best option. For human-edited files, pick almost literally anything else. For serialization, JSON handles the common cases and protobufs-and-friends are better when JSON isn't enough. There's not a situation I can imagine where I'd use XML for a greenfield project today.
{
"servers": {
# Frontend server is called alpha
"alpha": {"ip": "10.0.0.1", "role": "frontend"},
# Backend server is called beta
"beta": {"ip": "10.0.0.2", "role": "backend"},
}
}
Or, depending on your preference:
servers:
# Frontend server is called alpha
alpha:
ip: "10.0.0.1"
role: "frontend"
# Backend server is called beta
beta:
ip: "10.0.0.2"
role: "backend"
All of them suck in their own way. All of them work fine with autocomplete, type analysis, and autoformatting. The more things change, the more they stay the same.
JSON lacks comments, that's the biggest differentiator in my opinion.
Your YAML example doesn't need double-quotes around the IPv4 addresses, but then very confusingly and problematically does need double-quotes around an IPv6 address, due to the colons.
This creates a serious footgun in Ubuntu netplan, leaving a server totally unbootable, but simultaneously not triggering "netplan try" as any sort of parsing problem:
And Javascript doesn't need semicolons in all but three exceptional cases, but I still consider it good form to strongly type values whenever possible.
Getting in the habit of not doing so will lead to schema violations, like the Netplan problem you linked, which can crash the program trying to read your config. If it bails out at an unfortunate time, like most networking tools seem to do, you'll need to use a recovery boot image or serial console to fix your config.
> This creates a serious footgun in Ubuntu netplan, leaving a server totally unbootable, but simultaneously not triggering "netplan try" as any sort of parsing problem:
Been there, done that. A good config file or linter should’ve complained and not allowed me to commit such misstake.
JSON is by far the easiest to reliably parse. It doesn't rely on tabs or spaces which YAML suffers from. XML is just more verbose JSON without an array object, and has some redundancy in spec which is not a good design.
Lack of comments for JSON isn't a huge issue considering you can make the keys fairly verbose. And it would be actually pretty easy to add this into the spec, and parsers would still be backwards compatible.
It's my preferred configuration file format, it fixes all the problems I have with JSON (trailing commas, comments) without turning it into a mess full of gotchas like YAML.
A theory - they don’t want a third party code in a hot path (they do care about performance in vscode), they already have a very performant parser and they don’t want to add a complexity there.
All of them are easy to parse. I personally prefer the clarity of indented YAML over the endless nesting of {} JSON brings, but all formats are easy enough to read or write.
Lack of comments in JSON is a huge problem for config files. It's not an issue if you're just exchanging data between APIs, but for config files, comments are essentials.
There are some JSON specs that will do comments, but they rarely specify what dialects of JSON their parser accepts. There are also workarounds that abuse the fact duplicate key handling isn't part of the spec by specifying each key twice, once with a comment and once with data as most parsers only make the second key stick; those are even worse.
You can't add backwards compatible comments to JSON, there's no space in the JSON spec to retroactively insert comments somewhere. The closest you can do is the duplicate key trick, but as the spec doesn't state which of the keys to read as a value, that trick only works with specific parser implementations.
YAML has newlines that indicate the end of a field... unless otherwise specified, which then gets into issues with line endings. Indents can also cause issues, considering a space inserted somewhere by accident can mess up your whole document. JSON and XML rely on specific tags for elements, which are much more reliable, and thus easier and faster to parse.
You can easily add comments to JSON spec, by just writing every parser going forward with the added comment parsing. It would read old JSON non commented files just fine.
- introduce the least idiomatic form of YAML as "depending on preference"
- add an optional preamble to the XML example
- add comments when there were none, but also not to all of them
If you didn't artificially stretch the different examples to match, there'd be a much clearer difference between them all, especially considering the fact one of your three examples is a superset of the other.
The odd one out can't even capture an integer vs a string without a schema.
I was puzzled by the GP's choice to write the first YAML example in a completely unidiomatic way.
But I think the point of adding comments was just to show that comments are possible in some formats, and not others. Omitting possible comments from the XML example might have just been a sign of fatigue over this topic ;)
At any rate, I find the (idiomatic) YAML example to be -- by far -- the most readable of all, including the GGP's TOML example.
- To showcase to the many people who aah JSON is more legible that you can use YAML as "legible" JSON
- Most XML files I encounter come in this format. You can skip the preamble but it wouldn't match my real life experience.
- All readable config files I encounter have comments. I forgot to add comments to the XML representation, but I can't edit my comment anymore. I think everyone who ever encountered XML knows how to add comments, though. JSON simply doesn't support comments unless you use a niche JSON derivative.
As for the string versus integer problem: you always need a schema, or you'll run into very funny problems down the line. None of these formats intrinsically know what keys refer to an object and what keys refer to a string, that's all based on your schema anyway.
"10" is a string. 10 is a number. [10] is a single number in an array. {"number":10} is an object.
You're conflating advice for databases with advice for data serialization formats: XML captures less information about the data it contains intrinsically.
_
Also please don't use YAML as "JSON with comments", you're just asking to run into some obscure bug/corner case
If you're willing to do weird things there's always JSON5
I've gotten bit by trailing commas enough times (both manual edits and writing generators) that I absolutely expect any reasonable syntax to tolerate them. It's just so much easier and more consistent to tolerate them.
That's what Clojure got right. Comma is whitespace. When you print a datastructure, it has commas. But they don't effect reading of that structure. Its brilliant and practical.
It does. However, it’s much more common to edit the end of a list, in my experience. Still, a syntax that is entirely uniform (like trailing commas) is preferable, in my opinion.
JSON5 does, but most software just does plain and simple JSON. I haven't seen it used outside some Javascript webdev environments. The JSON5 docs also seem to be specifically targeting Javascript development.
If you're sticking to certain variants, you may as well use YAML, which supports JSON notation, as well as comments and various other improvements.
If you use python, there is an excellent json5 module. But true, it may not be as well supported by other languages.
I am not sure I may as well be using yaml. I don't like it for the multiple reasons in the OP and this thread.
If you are using python, I have found it to be quite easy to support both json5 and yaml, as well as converting between them for people who feel strongly about yaml. Not trivial but low effort.
None of them handles it well. YAML has 7 different modes to do it so you will inevitable mix up and use the wrong one, otherwise it's actually the only option that supports it. Json requires inlining \n. Xml only does it with whitespace indentation.
For human-maintained config, TOML is only "better" when the structure is so flat that it's almost indistinguishable from an INI file.
Anything more complex and it becomes one of the worst choices due to the confusing/unintuitive structure (especially nesting), on top of having less/worse library support.
YAML's structure is straightforward and readable by default, even for fairly complex files, and the major caveats are things like anchors or yes/no being booleans rather than the whitespace structure. I'd also argue some of the hate for YAML stems from things like helm that use the worst possible form of templating (raw string replacement).
I'm with you on all that. I think YAML's fine, and I like it way more than TOML for non-trivial files.
I think Python's pyproject.toml is a great use of TOML. The format is simple with very little nesting. It's often hand-edited, and the simple syntax lends itself nicely to that. Cargo.toml's in that same category for me. However, that's about as complex of a file as I'd want to use TOML for. Darned if I'd want to configure Ansible with it.
Agreed, I do a lot of Ansible, and it took me a while up front, but I've become pretty accustomed to YAML. Though I still struggle with completely groking some of the syntax. But, I recently took a more serious look at TOML and felt like it'd be a bear for Ansible.
A few months ago I made a "mini ansible / cookie cutter" ( https://github.com/linsomniac/uplaybook ), and it uses YAML syntax. I made a few modifications to Ansible syntax, largely around conditionals and loops. For YAML, I guess I like the syntax, but I've been feeling like there's got to be a better way.
I kind of want a shell syntax, but with the ansible command semantics (declarative, --check / --diff, notify) and the templating and encryption of arguments / files.
> For human-maintained config, TOML is only "better" when the structure is so flat that it's almost indistinguishable from an INI file.
Agree. I've recently inherited a python project, and I'm already getting tired of [mentally.parsing.ridiculously.long.character.section.headers] in pyproject.toml.
Seriously, structure is good. I shouldn't have to build the damn tree structure in my head when all we really needed was a strict mode for YAML.
> I'd also argue some of the hate for YAML stems from things like helm that use the worst possible form of templating (raw string replacement).
I was literally speechless when I saw helm templates doing stuff like "{{ toYaml .Values.api.resources | indent 12 }}", where the author has to hardcode the indentation level for each generated bit of text like a fucking caveman.
The tiny examples might look kinda okay, but when someone has stacked 10 different patch operations in a single file, it gets a lot harder to keep track of what's going on.
“Nesting is bad” is such a simplistic take. Nesting is absolutely essential and inescapable. What that statement is really doing is placing a limit on what whatever it applies to can be used for. It would be better to spend a few more words expressing what you really mean.
Your comment is a simplistic take on "Nesting is bad" given the context.
It's not hard to infer that they're referring to nesting as a footgun: make it harder and you lose some power but you keep your feet.
Config files are a poor place to complex and deeply nested relationships. If it's not ergonomic to reach for nesting people tend to be forced to rethink their approach.
The problem is "config" means different things to different people. Some people see config as "the collection of runtime parameters" basically a bank of switches: Pyproject.toml is config. Others see any form of declarative structured data ingested by a runtime as config: docker-compose.yml is config.
And of course to minimize impedance mismatch, the structure should be similar to the domain.
So yes I want a "config file" to handle at least a dozen levels of nesting without getting obnoxious.
Then I guess to frame it in your language: they want formats that encourage config files, not "config files".
And I don't disagree. The problems of nesting objects "at least 12 levels deep" aren't going to be solved by the right format. The tooling itself needs to expose ways to capture logical dependencies other than arbitrary deep K-V pairs.
What if your problem is best expressed as "arbitrary deep K-V pairs"? It's going to be more common than not, nesting really is that fundamental.
There is no escape, you can't win. If you want the nesting, and assuming you can't remove it from the problem itself (as you often can't, or at least shouldn't), there's only one thing you can do: move inner things out, and put pointers in their place. This is what we do when we create constants, variables, and functions in our code: move some of this stuff up the scope, so it can be used (and re-used) through a shorthand. It loses you the ability to see the nesting all at once, but is necessary (among other reasons) when the nesting is too large to fit in your head.
Of course once you do that, once you introduce indirection into your config format, people will cry bloody murder. It's complex and invites (gasp) abstraction and reuse, which are (they believe) too difficult for normies.
The solution is, of course, to ignore the whining. Nesting is a special case of indirection. Both are part of the problem domain, both are part of reality. Normies can handle this just fine, if you don't scare them first. You need nesting and you need means of indirection; might as well make them readable, too. Conditionals and loops, those we can argue about, because together they give a language Turing-complete powers, and give security people seizures. And we have to be nice to our security people.
This is whining that people won't endorse a lazy, poorly scaling approach to an engineering problem... and justifying that approach by conjuring hypothetical whiners against a common, better scaling solution.
If you need 12 levels of nesting, add indirection, or live with the fact no one is designing formats to enable your oddball mess of a use case.
12 levels of nested braces in a single function is already a crappy idea: it's an even more crappy idea in a config file because of the generally inferior tooling, and now there's a downstream component that needs to change to support a cleanup (meaning it almost never gets fixed and the format just gets worse over time)
> For human-edited files, pick almost literally anything else.
I'd still take XML over JSON for human-edited files. At least XML supports comments.
> For serialization, JSON handles the common cases
Counter-point, JSON sucks and is way overused. The types are too fuzzy, the syntax too quirky, and validators/schemas are almost never present. You can bolt that all on, but it wasn't designed for it and it shows. It was designed to be eval()'d, which you should also never do because it's a terrible idea. It's flawed at the foundations.
Those are valid points, and while I have a different opinion, I can't say you're wrong about any of it.
But I will say that the first time I used a JSON API that had replaced an XML one, I almost wept with relief. Perhaps because JSON is so simple, it pushed APIs toward having simpler (IMO) semantics that were far easier to reason about. Concretely, I'll take an actual REST API (that is, not just JSON-over-HTTP) over the SOAP debacle any day of the week. I know you can serve XML without using SOAP, but to me they're both emblematic of the same mindset.
> Counter-point, JSON sucks and is way overused. The types are too fuzzy, the syntax too quirky, and validators/schemas are almost never present. You can bolt that all on, but it wasn't designed for it and it shows.
XML's validation, schemas and typing are far more complicated and equally useless - the impedance mismatch is too big, all they do is give you a whole bunch of extra ways to shoot yourself in the foot, particularly in the presence of namespaces. If you want something fully structured, protobuf or equivalent is the way to go (and converting back and forth between protobuf and JSON is relatively painless).
XML Namespaces are one of the worst anti-features I have ever encountered. I have yet to see a legitimate use for them, but they make parsing way more of a pain than it needs to be.
Also the use of attributes vs nested tags seems pretty arbitrary and in my experience attributes are hardly used at all.
They have a parse.y that sort of gets traded around and joins new projects. Nothing super formal but it does mean most openbsd service configuration feels like each other. but each config is tailored to it's application. and being properly parsed the error messages can be better.
In fact that is my biggest beef about yaml, I mainly use it in the context of ansible, and the parsor usually has no clue where in the file the error actually is. You have to depend on remembering where you last edited to actually find the error. My other big problem with yaml is that the ansible context is trying very hard to make it a programing language... And while it is an okish config language it is a terrible programing language.
In fact this is a common problem with many complex environments. They want to try and push this complicated setup into a config file and claim "look it is easy, no programing required" when really what they have done is to push a programing situation into the worlds worst programing language. see also: xslt
# Diff for interactive merges.
# %s output file
# %s old file
# %s new file
merge="sdiff --suppress-common-lines --output='%s' '%s' '%s'"
it's useful the first time you dive in, not having to read the man page. But over time, the comments can get out of sync especially if you don't carefully merge in the package-maintainer's version every update.
Do you mean non-json types? Because the supported types seem pretty straightforward. (besides perhaps supporting null bytes in strings in things like postgresql)
> the syntax too quirky
Care to explain? This has always seemed like one of jsons strengths. The syntax for what is valid is pretty straightforward.
> Do you mean non-json types? Because the supported types seem pretty straightforward. (besides perhaps supporting null bytes in strings in things like postgresql)
What is a "number"? Is it a float? int? short? double? BigDecimal?
What about time values? Or dates? Oh, you have to just shove those into strings and hope both sides agree? That's fun.
> Care to explain? This has always seemed like one of jsons strengths. The syntax for what is valid is pretty straightforward.
One example is the json "spec" on json.org does not allow trailing commas yet many parsers do
Yeah, I personally find XML elegant and well-designed compared to the popular alternatives today. And things that supply what’s missing (for example, JSON-LD and JSON Schema) aren’t much less complicated than the XML equivalents.
Of alternatives, I think EDN is the closest to being a satisfactory replacement because it supports namespaces.
Tangentially: there are some objects in the Kubernetes configuration that require the data to be base64 encoded (I think it's secrets and config maps, probably something else). When I was preparing for my CKA certification, I used the (most?) popular course that introduced base64 encoding as a _security measure_. I think that also says something about the state of the industry.
No no no, k8s mistake was actually not using YAML hard enough. They built an object system on top of a format that can act as a typed serialization format for generic objects and then decided to just ignore all that and implement it on top of primitive types
in terms of something i'm shipping to users, i would much rather have to worry about the nuances of TOML than of YAML. with TOML, i don't have to worry about e.g. remote code execution because someone figured out a clever way to trick my yaml into running arbitrary code somehow. that kind of shit is annoying.
is it uglier? sure. but it's peace of mind...
in terms of config i'm using myself, say Kubernetes stuff, i really love YAML...because i know exactly what it's doing and i generally keep things simple. it's nice for that, it just does way too much IMHO...
I love it. Was so happy to see that syntax when I first encountered Toml. Hierarchy is always clear without having to scroll, and snippets retain context.
> I'd echo the linked article's argument back: I don't know of a case where XML is the best option.
From the article:
> “I’m making a new kind of book, and I need to annotate all of the verses in the Bible, and have the chapter headings and stuff.”
XML is a markup language, and works great for marking up text. It came out of SGML and attempts to make machine-usable documentation.
I'd favour its use in something like a datasheet, which is a combination of human-readable information, nested objects, and lots of stuff that needs to be machine parseable in fairly precise ways.
IMO it also works "fine" for other structured document formats that aren't text, like SVG, but that's not a strong opinion. JSON and other formats compete more sensibly here, but I'd never want a protobuf-based format for... writing an essay, for example.
XML is pretty good for information extraction with LLMs. It will parse your input text into a tree structure. JSON and YAML will conjure slightly different skills from the LLM. Maybe it all comes from the slightly different applications of these formats in the training corpus.
An flat toml file is indistinguible from ini except one point: you can create an pure key value file without sections but not with the most ini parsers i saw.
And yet it's the only one out of all those mentioned that can be written with ease by humans. After a short while you can do several levels of nesting without thinking about it. For all its issues, it's by far the most practical which is why it won.
Practical but incorrectly used. We should correct the misunderstanding so that people don't continue to make similar mistakes, inconveniencing themselves and others.
i agree, while YAML has its problems, it is good for human-written content, especially when it's not too deeply nested.
in short: all these have their use-cases. i use YAML where it fits and TOML where it fits better. I never use JSON because JSON is just for machines, I generate JSONs.
In fact, I'd echo the linked article's argument back: I don't know of a case where XML is the best option. For human-edited files, pick almost literally anything else. For serialization, JSON handles the common cases and protobufs-and-friends are better when JSON isn't enough. There's not a situation I can imagine where I'd use XML for a greenfield project today.