XML is better than YAML – Hear me out

MenhirMike · on Sept 21, 2023

No offense to the creator of YAML, but: The fact that it became one of the de-facto standards for cloud tooling is an absolutely damning statement about the state of the industry.

I get that XML is about as sexy as mainframes, and that a lot of folks here probably have PTSD from working with Java/Spring web apps, but YAML is about the worst of all worlds.

Though I think the real problem is that real-world configuration files are way too complicated for a simple/dumb/logic-less representation like a .ini/.conf file, so someone thinks to add some logic to is - which is just config-as-code. In a terrible programming language.

If you want config-as-code (and you want to!), just do it properly and use a proper programming language for it. Don't care which one, be it JavaScript, Python, Go, PDP-11 Assembly, or Rust. But please stop with these half-measure DSLs that just don't cut it.

kumarvvr · on Sept 21, 2023

Most people who lambast XML probably have never used XML or never had to need XML.

I designed a complex data acquisition system and after a lot of research, I settled on XML as the only viable option for complex user configuration, that is both readable and rich in content.

I then built a UI system that works with the XML and generates cofig docs with ease.

Sure, XML has been used in SOAP like systems, and it rightfully gets a bad rap, but that is more on the user than the technology.

A hierarchical document with values and attributes and custom tags? That is an almost DSL in itself.

unscaled · on Sept 21, 2023

> I then built a UI system that works with the XML and generates cofig docs with ease.

I lost you there. Not that I'm criticizing you since I've gone the same route and built a complex UI tool to manage said config (complete with XSD schema validation and config schema migration using XSLT for version upgrades).

But now I realize that modern developers don't want a GUI to manage their config. We want to store it git, review changes and perhaps even write our own automation and templating around it.

YAML is certainly a flawed format for most of these purposes, but so is XML. It is unnecessarily verbose and it carries a lot of complexity which was designed for a highly-extensible generic document format, but not for configuration files. XSD, Namespaces, Entities, Embedded DTD, CDATA blocks... You can't just ignore all of these, and there are very few parsers out there which work on a well-defined subset of XML. And even there, the whole attribute-vs-child-element choice is a giant distraction and constant source for unnecessary bikeshedding.

YAML has serious ambiguity issues, but there are better alternatives that have great library support like TOML. We don't have to go back to the excesses of the early 2000s and use XML as a configuration format.

mumblemumble · on Sept 21, 2023

In XML's defense, the attribute-vs-child-element choice is always obvious and straightforward when you're using it as a markup language. It's only when you use it as object notation that it becomes wonky, because there the choice simply doesn't map to any straightforward characteristic of the problem domain. I'm not sure I want to hold XML responsible for not being good at being something it isn't.

And, on that note, I think that if I could pick as single thing that irritates me the most about YAML, it's that it isn't actually a ML.

unscaled · on Sept 21, 2023

I'm not criticizing having attributes-vs-child-elements. The problem with XML is not that it exists or that it is a particularly badly designed markup language.

The problem is that XML is ill-suited for configuration files.

wokkel · on Sept 21, 2023

Why? I can store it in version control. I can validate if the file has syntax errors or schema errors etc. etc. With a proper platform it's easy to generate or parse. It allows for both hierarchical structures (looking at you .ini). The only downside might be that you need a larger library to parse it. But i probably encounter xml somewhere along the line anyway due to interactions with a 3rd party system, so that point is moot.

dvdkon · on Sept 21, 2023

You have a low bar for config languages. I'd also like one to not be overly verbose, not be clearly intended for markup instead of object storage and not have security issues until you configure the parser just right (though that's often not an issue with configs). XML fails all of these, and YAML isn't much better.

If we're limiting ourselves to just JSON, INI, XML and YAML as potential choices, I get why people cling onto one of these suboptimal choices and then fiercely defend it, but there are other options. There's libconfig, JSON5, Dhall, various interpreted languages...

Sadly these alternatives are far from mainstream and have no support in standard libraries, so I think most developers will continue to pick whichever common option's issues they can deal with.

PH95VuimJjqBqy · on Sept 21, 2023

I'd like my config files to have two words in it "do it" and for it to just know what I want.

But at the end of the day that's not possible because reality intrudes. You give up a lot for that lack of verbosity, not everyone will agree that it's a worthy tradeoff.

GoblinSlayer · on Sept 21, 2023

It's a noble goal to want a simple configuration format, toml is far from the simplest, line separated options format is simpler. The fact that it needs a parser indicates that it creates a bigger parsing problem than necessary.

baq · on Sept 21, 2023

Yeah JSX and especially TSX would be quite nice. YAML though can’t die soon enough.

moritzwarhier · on Sept 21, 2023

You think that JSX would be suited for configuration files?

Why?

I don't understand your comment.

baq · on Sept 21, 2023

Because people reinvent half of it, different half every time in each project, and call it 'configuration' which somehow gives them licencia poetica for whatever crazy dung they come up with.

Compare ansible vs helm vs github actions vs literally anything with nontrivial config.

Note that if you asked me for real, I'd be a dhall proponent, but most normal people look at me funny the first time they hear it and then the second time they see the syntax.

moritzwarhier · on Sept 21, 2023

dhall looks nice actually! Thanks for elaborating :)

dragonwriter · on Sept 21, 2023

> And, on that note, I think that if I could pick as single thing that irritates me the most about YAML, it's that it isn't actually a ML.

“YAML ain't markup language” is kinda the whole point.

LukeShu · on Sept 21, 2023

For a full year it stood for "Yet Another Markup Language".

First draft: https://yaml.org/spec/history/2001-03-30.html

Last draft calling it that: https://yaml.org/spec/history/2001-12-10.html

First draft with the new acronym: https://yaml.org/spec/history/2002-04-07.html

xorcist · on Sept 21, 2023

It's also funny that the file extension YML is the successor of XML, so the rest is probably a backronym from that.

dragonwriter · on Sept 21, 2023

The file extensions for yaml is .yaml, .yml is the alternative because of legacy 8.3 systems.

thaumasiotes · on Sept 21, 2023

Wouldn't the successor of XML be XMM?

mumblemumble · on Sept 21, 2023

Next you'll be telling me Beyonce's sisters should be named Beyoncd and Beyoncf instead of Aeyonce and Ceyonce.

mvandermeulen · on Sept 22, 2023

Feyonce can’t get anyone to believe she’s married and Vakira’s has astonishingly honest hips

dragonwriter · on Sept 21, 2023

Judging from BCPL, the successor should just be Y.

throwawaymaths · on Sept 21, 2023

The thing is if you're using it as configuration language, object notation is correct, not markup.

taeric · on Sept 21, 2023

I'm curious why you think that is? Especially since you are likely to want to style the configuration for display to the user in many cases, having it be a document makes a lot of sense.

That is, if the value is something you would expect to be able to show to a user, then it probably shouldn't be at attribute. If it is a value that changes how you would show it, then attribute makes a lot more sense.

I also think a strong adherence to any preference between attribute and markup is a touch too dogmatic. The only real distinction at the language level is if you allow children, or not. If it can have children, it pretty much has to be a child and not an attribute.

throwawaymaths · on Sept 21, 2023

Ok forget the difference between attribute and markup. What is the "right" way to think of this:

    <foo>
      <bar>1</bar>
      <bar>2</bar>
      <baz>3</baz>
      <baz>4</baz>
    </foo>

taeric · on Sept 21, 2023

Why do you think there is a "right" way?

wsc981 · on Sept 21, 2023

I think I’d prefer JSON over both XML and YAML when dealing with config files.

XML is just to verbose, making it harder to parse in my brain.

YAML is whitespace sensitive which I hate in general.

yjftsjthsd-h · on Sept 21, 2023

I'm fine with JSON except it doesn't officially do comments, but that's a big deal IMO

sixothree · on Sept 21, 2023

Which is especially problematic for config files where you want helpful hints or a place to store the “old value” while you do something else with the app. I have an app where I change a specific config almost every time I use it. I usually just copy the entire darn node and rename it to something that doesn’t get deserialized. Closest I can get to holding the value in a comment.

Koffiepoeder · on Sept 21, 2023

Might be interesting to check out json5.

https://json5.org/

FlyingSnake · on Sept 21, 2023

This is a great example why we should avoid writing custom DSLs and just use a proper programming language.

eitland · on Sept 21, 2023

Then again, Gradle has come to show why that is a terrible idea.

I think at some point 24 out of 28 Gradle projects I had access to at a certain customer had variations in either kotlin/Groovy style or the way they did or didn't use variables, how they did or didn't do loops or maps and what not.

With Maven you (or someone who know Maven) can immediately look at a rather small, very standardized file and start making educated guesses and so can an IDE.

With Gradle you sometimes have to run it to actually know what it will do.

aftoprokrustes · on Sept 21, 2023

I had the same experience with Maven vs SBT (scala build system, config is scala). At first it is really cool to have access to a full programming language (in particular when it is the same as the one the project is in, which means that you do not need to "switch brains" when working on the config), but quickly people start trying to be smart or cute, and it becomes a big mess. In particular in Scala, where people _love_ defining new DSLs and favor cuteness over readability. After two years working with SBT I still do not really understand some of the DSLish constructs used in there (and I tried to read the docs).

On the other side I fell in the trap of trying to overcome the limitations of purely declarative config formats by using jinja templates, which also ended up being a very bad idea and a maintenance nightmare.

For most projects, my approach is now to try to be as standard as possible compared to the particular community in the tech at end, and resist the urge to be smart or cute (hard!). Configuration always sucks, and I now prefer to just suck it up and get done with the config part, rather than loosing time reinventing the wheel, ending with a config that still sucks _and_ no one understands.

eitland · on Sept 21, 2023

The good thing about Maven is it is XML so everyone wants to keep it as short as possible ;-)

(More seriously: with Maven shorter and more boring is a sign that everything is correctly configured. Maven works by the convention over configuration principle so if you don't configure something it means it follows the standard. Which again means if you see someone has configured for example a folder or something that usually isn't configured it means they have put something in a non standard location.)

JD557 · on Sept 21, 2023

I think it's only popular in the scala ecosystem, but I find HOCON to be a pretty nice superset.

planede · on Sept 21, 2023

XML is whitespace sensitive, unless your particular schema just doesn't care about whitespace text nodes.

physicsguy · on Sept 21, 2023

Much easier to have invalid JSON than invalid XML as jsonschema isn’t even close

frant-hartm · on Sept 21, 2023

You know that JSON is subset of yaml, right?

arwineap · on Sept 21, 2023

The commenter said they preferred json over yaml because yaml is whitespace sensitive; Json is not

yaml's feature set is also too broad, forcing safe loaders

bshacklett · on Sept 21, 2023

I think the point is that if something accepts YAML, you can feed it JSON without issue.

> yaml's feature set is also too broad, forcing safe loaders

Could you elaborate?

arwineap · on Sept 22, 2023

Yaml tags allow for serialization of native types, and has been abused to run malicious code.

I think you can also make circular references in yaml effectively making a zip bomb

This is why you have yaml.safe_load in Python and SafeConstructor in Java's snakeyaml package

The end result is that you should not use yaml to handle untrusted data unless you are also explicitly handling it safely

bshacklett · on Sept 22, 2023

Any de-serialization should be done via safe loaders. Using `require()` or `eval()` isn't something I'd do outside of a very narrow set of scenarios.

arwineap · on Sept 23, 2023

The json modules in the same languages doesn't support the madness and doesn't necessitate a safe loader

It's possible someone could come along and write a json library that would support this, but somehow we have made it this far without it and that's a good thing

The point is that yaml and xml both have side effects in the form of require and eval that json won't, and frequently people are unaware of this

bshacklett · on Sept 27, 2023

I feel I must still be misunderstanding something. OWASP has pages of resources which talk about the lack of safety of JSON:

https://owasp.org/search/?searchString=json

Perhaps yaml and xml have _more_ ways to inject behavior into an application, but I would still not consider JSON safe in any way. Why would JSON.parse() even exist if `require()` and `eval()` were safe to use?

dragonwriter · on Sept 21, 2023

Since most people’s (especially critics) understanding of YAML is stuck on YAML 1.1 where that wasn't quite true, probably not.

blincoln · on Sept 21, 2023

> But now I realize that modern developers don't want a GUI to manage their config.

I think, like almost all software, it depends on personality and other factors like how frequently someone has worked with a language/tool/etc.

If someone has the traditional Unix "read all the man pages in their entirety" kind of brain, they'll probably never use a GUI.

If someone has more of a "learn by example, then refer to the docs to cover cases the examples didn't handle" brain, a GUI can be a much easier way to clearly expose all the potential functionality of a tool. It's much easier to understand (IMO) than e.g. a template configuration file with every possible option present but commented out.

CableNinja · on Sept 21, 2023

I recently started a new project, and just outright didnt want to use yaml to do it. Sadly, im not everyone, and if i plan to release this project, i need to account for that. Whats a guy to do?

Ill tell ya hwhat... I made some simple functions; one to load any yaml/toml/json from file, by just looking at the extension (and providing an override for not normal file extensions). Another to output any data as yaml/toml/json.

I defaulted to using toml for my project, but have provided ways for everyone to be happy, with zero mucking around.

Aaaaaand for the not so subtle promotion, https://pypi.org/project/atckit

Located in the UtilFuncs class.

elzbardico · on Sept 21, 2023

What is the point of substituting yaml for something even worse?

CableNinja · on Sept 21, 2023

Worse is subjective.

In any case, the library i linked has the functionality to load any of the three configuration types, so while i prefer toml, if you grabbed my (under development) project, and hate toml, you can use yaml or json, if you wish.

HelloNurse · on Sept 21, 2023

> a well-defined subset of XML

No. You need to define what configuration files are valid or not, but a subset of XML on the basis of syntactic and structural alternatives is lame and nonstandard.

Get a library that parses actual XML, allow anything reasonable as input (for example, no external entities, DTDs, PSVI etc. to keep the file self contained) and "flatten" CDATA sections, entities, namespace prefixes, idrefs etc.

bryanrasmussen · on Sept 21, 2023

>But now I realize that modern developers don't want a GUI to manage their config. We want to store it git, review changes and perhaps even write our own automation and templating around it.

not all configurations are maintained by developers, or at least theoretically they shouldn't be.

kumarvvr · on Sept 21, 2023

I built it to make it so that average users can make their own config. The main system works off on only XML files.

thayne · on Sept 21, 2023

I have used XML a fair amount in past years, but now avoid it.

There is a subset of XML that a decent language for some use cases. In particular it is good for documents whete you want a _markup_ language.

But I've seen it used for a lot of things where it wasn't a great fit.

And xml has too many features, which leads to implementations that are inconsistent with each, often slow, and have security problems such as xxe.

I wish that there was a standardized simplified xml format, that would avoid many of xmls problems and meet the needs of 90% of applications where xml is a good fit.

GoblinSlayer · on Sept 21, 2023

The only objectionable feature of xml is dtd, everything else is easy to implement in 600 LoC.

thayne · on Sept 21, 2023

I also think namespacing adds complexity that usually isn't needed.

GoblinSlayer · on Sept 22, 2023

Namespaces can be parsed as attributes, then you can add a namespace resolver as a third party algorithm.

emodendroket · on Sept 21, 2023

> Most people who lambast XML probably have never used XML or never had to need XML.

This seems like an absurd thing to say. XML went through an extreme bout of popularity. Maybe you could plausibly say that people have been soured on XML by ill-conceived uses for it that don't demonstrate its strengths... but you think most people have never worked with it? Come on.

wokkel · on Sept 21, 2023

I've done my fair share of systems integrations and the number of teams that did not know xml or have never used it in a professional settings was around 20%. Of those who did use it a staggering amount of people never understood namespaces and started testing for equality on the element name and namespace prefix string instead of the namespace declaration. When somebody claims they "know" xml I initially treat it as a developer saying that he/she knows 'sql' while reviewing their code and seeing them do joins in the application logic.

pastage · on Sept 24, 2023

That is the problem with XML, there is so much logic that is needed in the application so to be able to understand it. You just need so much knowledge.

bryanrasmussen · on Sept 21, 2023

> Maybe you could plausibly say that people have been soured on XML by ill-conceived uses for it that don't demonstrate its strengths... but you think most people have never worked with it? Come on.

https://truelist.co/blog/software-development-statistics/

So, according to the link - The average software developer age is between 25 and 34 years.

I think we should also define - are we talking about people who have worked with XML because they made a google site settings xml file OR people who have done serious work with XML and know what they're talking about?

First type - pretty much everyone.

Second type - not very many. I'm pretty much the only person who knows anything about XML wherever I go.

if you are 25 you have probably not done anything with XML or at least not anything important.

If you are 34 you might have, but I mean the last time I did anything really important with XML was 2013, I did a few other things since then because I knew XML was the best solution but that was me or because there was a very niche thing I was doing and the company was providing an XML api.

I bet most of the age 34s have not done anything meaningful with XML either, even though if you are 34 I suppose you probably had some ticket at some point that took you a week and you thought wow, my extensive experience with XML now gives me the right to grouse about how bad it is! If only everybody knew as much as I the world would be a better place!

on edit: my example of google site settings file is an example of some trivial usage, not meaning that pretty much everyone has done that exact trivial usage.

emodendroket · on Sept 21, 2023

I am 34 and I feel like when I started at this job we were still going through the “everything must be XML” hangover. But hey there’s always a higher mountain.

yyyk · on Sept 21, 2023

It depends on what age you are and whether you work with documents. If you started working in the industry past 2010 and didn't have to work with generating/reading (X)HTML/OOXML/ODF than it's rather likely you've never had an experience with XML (fortunately SOAP was deprecated very quickly).

bradgessler · on Sept 21, 2023

My issue with xml for configuration files is not understanding when to use elements vs attributes to describe data.

The nice thing about JSON, TOML, and YAML is they have implicit structures for arrays and key/values encoded within them.

XML has a lot of different ways of representing data that way, which is what makes it a challenging configuration file format.

KnobbleMcKnees · on Sept 21, 2023

That arrays point is such a weakness in XML that I rarely see addressed. Arrays and lists are such a common data structure in almost every programming language of the past 40 years that not having first class syntax for representing them is absurd and a huge weakness that makes XML a non-starter for me.

GoblinSlayer · on Sept 21, 2023

I wrote many configuration files and encountered maps and sets, but never arrays.

KnobbleMcKnees · on Sept 21, 2023

The qualities of sets that arrays don't have (and vice versa) are irrelevant to the point of neither being implicitly representable in XML.

You're either providing complex objects as properties or you're providing a list of complex objects. Worse, you can have a combination of both. Without a schema it is not possible to infer whether either or both is happening.

chii · on Sept 21, 2023

> not having first class syntax for representing them

i think you're mixing representation of xml data vs the representation of them in a programming language.

XML does have arrays. They care called child elements.

KnobbleMcKnees · on Sept 21, 2023

No you've misunderstood my point. This doesn't work for cases where one child is in fact a property that is a complex object.

XML claims to solve the problem of attributes vs children but then falls short at the first hurdle by not discerning between a single complex object as an attribute and an array of complex objects as children.

JSON and YAML do not have this problem as they are explicit in their representation.

YAML example:

    parent:
        child: name

vs

    parent:
      - child: name

Try converting each of these to JSON. The former will give you an object property called child, the latter will give you an array property called child with one element

maxfurman · on Sept 21, 2023

I'm not sure about that - I think your second example will parse to "parent": [{"child": "name"}] in JSON

KnobbleMcKnees · on Sept 21, 2023

Yeah this was what I was going for, same point stands

nickpp · on Sept 21, 2023

Aren't XML child elements pretty much the most verbose way you can represent an array though?

krab · on Sept 21, 2023

I think the verbosity is not a problem. For example if you compare

    ["string1", "string2"]

to

    <list>
        <e>string1</e>
        <e>string2</e>
    </list>

then each element has about four bytes overhead (<e> instead of " and </e> instead of ",) plus some overhead for the list itself that may be offset by putting the name of the list itself into the element.

However, the issue is that you have to write a custom parser. There is no direct mapping between your data structure and the XML file. This developer ergonomics is a big win for JSON and consequently YAML.

chii · on Sept 21, 2023

> There is no direct mapping between your data structure and the XML file.

i think that's by design tbh.

it's only a big win for JSON (and YAML) because the default case works OK - but every time someone has a problem parsing numbers in JSON (because the value is bigger than Integer.MAX in the host language), this is the cause.

krab · on Sept 21, 2023

Yes, I understand that (and I like XML as a format and XSLT 2.0 as a language). However, from the popularity of JSON, it seems that for most cases it's the easier choice.

Take any random REST API for example. If it returns JSON, you can integrate it more easily than if it returned XML. If you need special cases like large numbers (or date-times), you handle only those.

taeric · on Sept 21, 2023

I'm confused? Integrating XML was fairly easy back in the day. If in a dynamic language, serialize into a DOM and then use xpath to get data out. If in a static language, parse into your objects.

With JSON, you can mostly do the same. Such that I don't necessarily see this as a huge advantage of XML, mind. Having a schema does have some advantages, though.

GoblinSlayer · on Sept 21, 2023

JSON maps only to javascript, but only because it was designed as a subset of javascript, for others you have to use DOM or serializers, then there's no difference between formats. For this matter, xml has generic serializers than can be used instead of writing custom one every time.

layer8 · on Sept 21, 2023

If you interpret the start and end tags of the child elements as syntax indicating the type of each value, then those tags are analogous to, say, the quotes that enclose a string literal. In other words, in

    <foo>hello</foo>
    <foo>world</foo>

the <foo> and </foo> serve the same purpose as the double quotes in

    "hello",
    "world"

with the added benefit that the type system can be much richer (i.e. not everything is just a nondescript string value).

And you don’t even need a comma to separate the values! ;)

wokkel · on Sept 21, 2023

The main reason i avoid any typeless language is dates... how do i represent a date/time including a time zone has been badly reinvented so many times. A string type is never the way to go there in my opinion.

Smaug123 · on Sept 21, 2023

One of the classic lessons of the Falsehoods Programmers Believe about Time is that in general you can't correctly do better than simply storing the user's input (and the instant and place they entered it from) verbatim, unless you know something more about what they were entering. It's usually fine to store times in the past as a timestamp since the epoch plus a location, but the meaning of "2025-01-28 15:00 in Europe/London, for the purpose of a meeting that's being hosted there but is accessible by video call" is much more subject to change when e.g. countries change time zone. It's also not necessarily the same as "the absolute point in time 2025-01-28 15:00 assuming London's time zones stay as predicted since I entered this on 2023-09-21" or "2025-01-28 15:00 in Europe/London, for the purposes of a meeting that's being hosted in Lisbon but which I'm accessing by video call from London" (because then the Lisbon local time is the source of truth, not the London one, if Lisbon changes time zone).

bradgessler · on Sept 21, 2023

What’s the best representation you’ve seen for dates?

taeric · on Sept 21, 2023

Hasn't https://en.wikipedia.org/wiki/ISO_8601 basically won, at this point?

stackedinserter · on Sept 21, 2023

UNIX timestamp. Plus timezone string, in separate field/column, if it's important for the use case (like calendar events, etc).

XorNot · on Sept 21, 2023

Epoch time + GPS coordinates.

That's as unambiguous as you could possibly make it IMO.

appplication · on Sept 21, 2023

The problem is that xml just isn’t particularly human readable. I don’t think there’s more to it than that. The brackets just make it overly verbose and difficult to read at a glance.

kumarvvr · on Sept 21, 2023

I have heard this again and again. But to me, well formed XML is incredibly easy to read and skim.

I don't feel the same ease with JSON or YAML though.

Aeolun · on Sept 21, 2023

> I don't feel the same ease with JSON or YAML though.

I imagine the same is true for people that don’t like XML. There’s just many more people that find it easy to read JSON, even though there are a few that find it easy to read XML.

dmje · on Sept 21, 2023

Yes! I find JSON so, so hard to read compared to XML. I thought I was the only one :-)

wokkel · on Sept 21, 2023

Your not alone: it's verboden requiring " around anything that is a string. Commas to separate array elements. It's a technical format without the technical foundation. It's the civil engineering equivalent of building the Golden Gate Bridge out of wooden beams as that is what you have, not what you need.

pjmlp · on Sept 21, 2023

You are not alone.

jimmaswell · on Sept 21, 2023

I find it very easy to read. The additional structure only helps. Why does anyone have a problem reading XML?

sparrowInHand · on Sept 21, 2023

This. It dances in front of the readers eyes, turning into unreadable gibberish. Json is how you would draw a datastructure on a chalkboard.

kllrnohj · on Sept 21, 2023

It's the same syntax shape as HTML and React's JSX, yet rarely do people complain about how "unreadable" they are?

xigoi · on Sept 22, 2023

HTML is not a configuration format. Tags are fine for text markup.

kllrnohj · on Sept 21, 2023

The game Rimworld uses XML for describing all of its game objects and it makes modding a fantastically wonderful system as you can modify/replace parts of the object using XPath. The end result is mods rarely conflict as they are able to target the specific mutations they want quite easily

https://rimworldwiki.com/wiki/Modding_Tutorials/PatchOperati...

makeitdouble · on Sept 21, 2023

> I then built a UI system that works with the XML and generates cofig docs with ease.

It looks like you're experience is mainly with XML as an underlying format, with human only dealing with it either at the coding level or through a tool generating the needed confs. In that kind of scenario I'd wager any coherent file format would probably work, even if the configs where encoded in brainfuck in the final step.

XML gets hated because we also had to read it and hand edit it as humans, when dealing with system configuration files (the source that will generate the rest of the configuration) and other upstream documents that are the base input for the system to read downstream when the GUI aren't available for that.

I've worked with a Symphony code base that used XML for all the routes and DI declarations, and yes it was an utter pain to write an edit tags for such simple and repetitive configurations when even an ini file would have been good enough. And god have mercy of the guys that put CDATA sections in the middle of that just to be sure CJK chars wouldn't accidentaly trip the syntax.

trilobyte · on Sept 21, 2023

I have delivered XML data products comprising terabytes of information in (when I last checked) more than 800 schemas to companies around the globe, and people who have a rosy view of XML are using missing how it usually works in practice. XML is extremely heavy and brings a lot of half-baked ideas, and consumers are almost never flexible in ingesting the XML. It means teams wind up supporting insanely convoluted schemas that customers will never migrate off of.

altcognito · on Sept 21, 2023

I can’t help but agree that if we had just built better tools, xml would have had better adoption.

MenhirMike · on Sept 21, 2023

I actually think that XML has some of the best tooling in the entire industry. The problem is: It's mostly commercial tooling, which costs money. Stuff like Oxygen or XMetaL is pretty neat, and I always found Visual Studio's "Create Web Client from SOAP" pretty useful (especially if the Web Service is written in .NET, since it auto-generates a proper WSDL file).

pjmlp · on Sept 21, 2023

XML is all about tooling, but most FOSS folks insist in holding it wrong, editing XML files in vim/emacs.

fzeindl · on Sept 21, 2023

I find XML tool in general feature complete and great.

Especially editor integration works great as opposed to YAML which is so ambiguous that it IntelliJ IDEA constantly breaks indentation during copy and paste.

Spivak · on Sept 21, 2023

And better conceptual docs because it's supposed to be that elements are complete deserializable objects. Having to resort to xpath as the default way of poking at XML where JSON has a good implicit default schema made XML feel so clunky.

wokkel · on Sept 21, 2023

At least xml has xpath ;-) the implicit Jason to object mapping has failed me on character encoding (no way to specify that in json). Stupid type errors (date/large numbers).

ExoticPearTree · on Sept 21, 2023

I have a deja vu over YAML and XML. There was a similar discussion about this a month or so ago, IIRC.

And I haven't changed my opinion since then, having worked extensively with XML: it is a plague that was brought on into the world and it needs to be killed with fire. I understand that when it was released there were no other alternatives so "it was better than nothing", but it should have died right after JSON was invented. But no, why have cleanly formatted file when you can have an XML one...

GoblinSlayer · on Sept 21, 2023

JSON syntax is optimized for serialization and thus unsuitable for other purposes, and for serialization loses to binary formats except for one use case: it's a serialization format that can be nicely embedded in html.

agumonkey · on Sept 21, 2023

too much energy wasted on endless xml schema debates while a bit of { a : [1,2,3] } was enough to do 80% of your job

this plus, like GP said, java ee xml bloat culture made it too much of a pain

gjvc · on Sept 21, 2023

Real XML has never been tried.

lmm · on Sept 21, 2023

> I settled on XML as the only viable option for complex user configuration, that is both readable and rich in content.

How was it more readable or richer than JSON? There's more stuff in XML, but in my experience that stuff doesn't actually help you any. The schema validation has a lot of detail, but since it can't access your actual system you can't really validate in that detail (like, maybe you can validate that an ID is between 6 and 8 digits long, but really you just want to validate that it's an ID for something that's present in your database). The distinction between attributes and nested tags feels like it should let you express more, but in practice it usually just gives you two equally reasonable ways to write the same thing and causes more confusion. Comments and non-tag text nodes feel nice, but complicate your parsing more than they're worth.

> Sure, XML has been used in SOAP like systems, and it rightfully gets a bad rap, but that is more on the user than the technology.

If one person uses the technology wrong, it's a problem with that person, but if most people use the technology wrong, it's a problem with the technology.

frant-hartm · on Sept 21, 2023

> If one person uses the technology wrong, it's a problem with that person, but if most people use the technology wrong, it's a problem with the technology.

But yaml has the exact same problem. People use it where they shouldn't.

ExoticPearTree · on Sept 21, 2023

Do a mental exercise and think about what would k8s with XML configuration would have looked like and if would have taken off like it did using XML for configuration instead of YAML.

This is just an example.

Besides Microsoft frameworks that use XML as configuration, and Java frameworks and servers using XML for configuration (Tomcat comes to mind), no one else uses it. Maybe traditional software whose programmers don't know any better.

So yeah, I think the world doesn't like XML for really good reasons.

frant-hartm · on Sept 21, 2023

I would have very much preferred it. It would be a lot more readable and accessible. Nowadays there is tooling, but imagine having schemas and autocomplete for all the K8s files from day 1.

So pretty much all enterprises use it. Not bad for tech not in fashion.

XorNot · on Sept 21, 2023

K8S problem really isn't YAML, it's that the thing it's trying to configure is a naturally complicated space that really wants to be typed but can't commit to any one language either.

lmm · on Sept 21, 2023

I'm not a big fan of YAML, but I think it has a better "hit rate" than XML.

gorgoiler · on Sept 21, 2023

Configuration through code requires a lot of organisational discipline. That’s easy to do if it’s just you and your own code. It’s easy if you have good social connections between the people who review each others code (and you also have code review.)

One bad apple can ruin everything though:

  from config import School, Teacher, Course, MRS

  MNH = Teacher(MRS, “Marissa”, “Neve”, “Harman”)
  BIO = Course(“Biology”, MNH)
  ST_SIMONS = School([BIO])

Oh what a nice tidy config you have there! Let’s ruin it!

  courses = [BIO]
  if str(today()) < “2023-04”:
    courses.pop(BIO)
  if today() > DIVORCE:
    for c in courses:
      if c == BIO:
        c.teacher.name = (“Marissa”, “Cox”)
        c.teacher.title = “Ms”
  # gibberish ad nauseam

I suppose it’s possible to write nonsense code in any language including YAML, so I don’t know if this is a very good point or not. To put it diplomatically though: your cadre of YAML editors, should they be moved over to using Python, are probably the ones who need the most help writing clear code.

diarrhea · on Sept 21, 2023

Took the words out of my mouth.

This is exactly what I fear will happen. If you look at the YAML some people produce and extrapolate that to one of the most dynamic languages available, it is going to get ugly.

Programming languages for config are better, and I will throw up if I ever have to see “list comprehensions” and null checks in Terraform ever again, but it requires people who can code. If you simply replace YAML with Python and that’s folks’ first time using a normal language, it won’t work. Hence I’m happy to stay with YAML mostly, in such a scenario.

dvdkon · on Sept 21, 2023

Then again, the YAML of Ansible or various CI platforms basically is code. People who already successfully write that will do well enough in Python. (Not that YAML isn't used in plenty of non-code usecases.)

diarrhea · on Sept 21, 2023

It is code, but without any of the aspects usually ascribed to code. Abstraction, tools like dependency injection (fancy term, but simple and highly important concept), more complex looping (not just `for each item do`; `for each item do, if item...` etc. are needed).

Unit tests?

Not to speak of tooling support. I can write a Python application with the strictest type settings and have mypy do a lot of heavy-lifting for me, before even running the app once. A bit like Rust. Check out the typestate pattern for what I am on about. It's invariants enforced in the type system, by the compiler. Impossible to misuse: your code simply won't compile. All of that is impossible to have if your types are strings only, with the odd bool and float inbetween.

I will accept that we cannot have ops people be at least medium-grade developers, which would be needed to apply these topics (I consider myself an in-between, leaning dev). That's simply infeasible. It's two different worlds. I will not accept the premise that these things aren't objectively better though! They're practically inachievable, sadly (or at least a decade away).

So no, in that advanced sense, YAML is not code. And if you can do YAML, yes you will get Python syntactically correct with a little practice. That is only 5% of the way though. Note I'm also not advertising for Enterprise Java-level of code... but more than "YAML but in Python".

XorNot · on Sept 21, 2023

I disagree strongly: configuration should not be code. Code should be code. Configuration should be static and simple.

It is not trivial to tell what code is doing, and as soon as your configuration is code then really you're developing a new application which implements another configuration language.

diarrhea · on Sept 21, 2023

Sure, if your infrastructure landscape is describable statically, that's how it should be done.

Most scenarios nowadays are _not_ like that anymore. There isn't 5 servers next door, there's 300 serverless whatevers half a globe away. Are you going to have 300 list entries, 230 of which nigh identical but 70 subtly different? Trivial in code, almost impossible to express statically.

XorNot · on Sept 21, 2023

What you're describing though is a configuration variability which would be expressable by a static tagging regime - still simple.

If your configuration is getting more complex then that though, again, it's not really configuration anymore - it's a management application which needs to be developed and treated like that. And why it's that complicated should be re-evaluated - i.e. how come this is "configuration" and not something the application detects for itself? Why is it being surfaced to the user (operator) at all?

lodovic · on Sept 21, 2023

I'm not convinced that trading a whitespace sensitive DSL for a whitespace sensitive dynamic scripting language will solve the issue.

TeMPOraL · on Sept 21, 2023

I think whitespace sensitivity was not the point of the exercise.

But on that topic, I feel whitespace sensitivity is a bad idea even for generic-audience configs. People learn how to use parentheses in primary school. Explicit grouping and nesting isn't black magic.

Smaug123 · on Sept 21, 2023

In general I am sympathetic to whitespace - I find the argument "whitespace insensitivity means the code now contains two conflicting sources of truth: whitespace which the programmer uses, and braces which are the only one that matters because that's what the computer uses" to be compelling.

But the whitespace sensitivity of YAML is particularly bad because of e.g. Helm. Templating a whitespace sensitive language with string interpolation is such a terrible idea.

larodi · on Sept 21, 2023

> I get that XML is about as sexy as mainframes, and that a lot of folks here probably have PTSD from working with Java/Spring web apps, but YAML is about the worst of all worlds.

I really like your sense of humor, much appreciated. Question is whether the PTSD came out of the XML or the Java usage.

Also indeed we're eager to see finally some reasonably good Assembly configs, it'd definitely be a blow in the face of Rust magicians.

I would vote for Prolog as a config language, though. If memory serves right - some 10 years ago Matt Sergeant of Perl community did a very good talk about this approach, as result of his explorations of this area. Interestingly he has some definite experience with XML also https://www.xml.com/pub/au/22.

kevin_thibedeau · on Sept 21, 2023

Tcl actually excels as a configuration format as it has a simple syntax and you can initially restrict the available commands to a safe, non-Turing subset then add them back in piecemeal as they become necessary for more powerful scriptable setups.

emodendroket · on Sept 21, 2023

YAML is easy to read, everything has a parser for it, and it plays well with version control, which cannot be said of most serialization formats. Not sorry for using it in projects and will continue to do so.

pjmlp · on Sept 21, 2023

XML not only everything has a parser for it, it has schema validation and transformation tools across schemas.

emodendroket · on Sept 21, 2023

Yes but it is not easy to read (or write) and version control works less well with it than YAML. And having more complex features is part of what people disliked about XML in the first place.

pjmlp · on Sept 21, 2023

Having sorted out several merge conflicts in Kubernetes configuration spaghetti, I disagree.

YAML is only simple on the surface.

emodendroket · on Sept 21, 2023

If that exact same configuration were JSON or XML would it be easy to sort out? It seems obvious to me that it wouldn’t because the configuration itself is complex.

pjmlp · on Sept 21, 2023

XML definitely, as it is structured, with schema validation.

emodendroket · on Sept 21, 2023

Structure and schema validation are not typically something your VC cares about since it treats everything as text. So I'm not seeing why that resolves the problem.

boredtofears · on Sept 22, 2023

Because it doesn't make it to your VC in the first place if it doesn't pass schema validation.

emodendroket · on Sept 22, 2023

Except you were just talking about "merge conflicts..." two valid versions of the configuration don't necessarily stay valid when they get merged together.

pjmlp · on Sept 22, 2023

Because a savy developer will validate their schema when sorting out merge conflicts across the whole configuration, instead of commiting a badly merged file that will fail starting up the Kubernetes cluster in the sea of YAML files.

Aeolun · on Sept 21, 2023

I think it’s because XML was just that painful to deal with. When someone offered an alternative, any alternative, that didn’t look utterly crazy, people grabbed it with both hands.

All the languages/tools that use it as a config/data format were started during that period, before people realized that all they’d seen of yaml were toy examples.

esperent · on Sept 21, 2023

I think YAML has a particular valid use case: short, well defined, human readable config files that need to be edited by less technical users.

The idea is that there's a single config.yml file in the project root. It's mostly empty but could grow to ~100 lines if all config options are are changed, which realistically will be almost never. Most times it'll be 0-20 lines long.

In that case xml would be needlessly confusing. It's not easy for non technical users to read.

wokkel · on Sept 21, 2023

Non technical users are usually sufficiently confused by the git tooling to get to said file..I don't see the point.

packetlost · on Sept 21, 2023

Config as code is basically the default for (Python) Django applications, but for compiled language you would need to bake in a scripting engine or something. There's always sqlite, which is almost everywhere and very fast.

mkl · on Sept 21, 2023

Lua is used quite a bit for that purpose, as it's small and easy to embed. I'm not a Lua programmer, but bodging together a few snippets of Lua with a well-defined configuration API is much easier than trying to do that with SQLite statements.

thadt · on Sept 21, 2023

Indeed. About 15 years ago I picked up Lua for exactly that purpose on an embedded device that needed a somewhat sophisticated configuration. Readable. Comments included. If/else for a few things when needed. Was the best configuration experience I've had before or since.

packetlost · on Sept 21, 2023

Lua is a fantastic embedded language! I'm partial to Lisps/Schemes for this purpose, EDN/S-Exprs are pretty good config formats as well.

qudat · on Sept 21, 2023

I share the same sentiment. Provide frameworks not config DSLs.

People eventually want logic inside their config objects. We can argue how much logic but at a certain point, just using a popular programming language just makes sense because it’s familiar. I’d like my coding expertise translate into this part of the ecosystem.

usrusr · on Sept 21, 2023

That's where JSON shines, it's a properly "dead" format that can be trivially opted into allowing logic with the old console.log JSON.stringify, without even touching the unaffected lines (and all the git history they might be associated with). Certainly more trivially in some environments than in others, but I suppose it's close enough to the sweet spot of "adding logic shouldn't be too easy/shouldn't be too hard" wether the project is already nodeish or not.

The only downside is that if the objects in question happen to be properly documented in typescript, you will never want to go back.

If only there had been a formalized side by side from the start between JSON, the clean serialization format, and the JSON superset for human authoring (comments and quotes anarchy) that people have been informally reinventing hundreds of times...

wokkel · on Sept 21, 2023

People want flying cars and free lunches as well. Logic in config shows a defect in your design, not a feature of the system.

vbezhenar · on Sept 21, 2023

I know that everyone hates YAML, but in my experience it's not that bad. I'm not using it to its full power, I guess, more like safe subset, and it works fine for me.

XML has its place, but I wouldn't want to replace my yamls with XML.

emodendroket · on Sept 21, 2023

I don't hate YAML. Who hates YAML?

baq · on Sept 21, 2023

I didn’t, then I’ve started writing GitHub actions and now I do.

pydry · on Sept 21, 2023

Github actions is fine provided you keep the YAML as short as possible and try to call out to a normal script to do everything - i.e. use it as config.

The problem comes when you try to use it as a programming language in and of itself (which MS encourages because it's a route to vendor lock in). It's a shitty, shitty programming language with shitty debugging tools.

baq · on Sept 21, 2023

To your first point: yes.

To your second point: yes, yes, and very much yes.

Creating complex release workflows within GHA is hell given the debugging situation and that's what we're using because 'it's there, use it, everyone's using it'. Send help

sethammons · on Sept 21, 2023

I have literally left a 300k+/yr job, twice, due to having to write and maintain too much yaml.

Templating and significant whitespace is the worst. 10 commits in a row of "error on line 1" until helm finally parses what you want.

I am a software developer with extensive distributed systems experience. I like thinking about systems, not guess-and-check on config templates that become a bespoke DSL with no proper way to debug.

Our config structure requires similar but individualized configs for on-prem and the cloud. These can deviate for the region, the data center, the logical cluster, and at the individual node level for A/B or canary reasons. And of course local dev. You need to have a way to know where the deviations are, like different regional hostnames for services you connect with, different node sizes or replica counts, and different labels, any of which may or may not be correct and will cause an incident if wrong.

Search you yaml PRs, and I bet you will find a commit along the lines of "because yaml."

funcDropShadow · on Sept 21, 2023

Me and anyone who read https://noyaml.com/ .

emodendroket · on Sept 21, 2023

So a handful of well-known pitfalls and complaints about how it’s easy to represent the same information differently. Yeah, switching to XML definitely solves that problem.

sph · on Sept 21, 2023

The fact that you know them doesn't make them any better, it just proves you've got Stockholm Syndrome

emodendroket · on Sept 22, 2023

I get your point but there is exactly no technology you can use without some known pitfalls, so you pick the one whose pitfalls are the least catastrophic or bothersome to you.

orphea · on Sept 21, 2023

I do. YAML can return to the rotting hole it's from.

_ZeD_ · on Sept 21, 2023

Them why not json?

danmur · on Sept 21, 2023

* JSON doesn't have comments. I could stop right there because that's a total deal-breaker for me for anything that's supposed to be read or written by humans.

* JSON doesn't handle multiline strings.

* JSON is not especially readable (no structure enforced, braces and double quote mandatory).

* JSON can't load YAML but YAML can load JSON.

(lots more)

vaylian · on Sept 21, 2023

No trailing commas makes editing JSON a pain

thaumasiotes · on Sept 21, 2023

The lack of trailing commas isn't the problem. The problem is the intermediary commas.

Those are used in JavaScript to give you the option to provide an expression in place of a single term. But that's not even allowed in JSON, so the commas aren't serving any purpose at all.

deafpolygon · on Sept 21, 2023

Given this, I'd much rather XML any day. You can comment using  blocks.

danmur · on Sept 21, 2023

I have zero problems with XML. Hard sell though.

It's annoying though because it's so mature compared to everything similar (even YAML, which I obviously quite like); like if you make an XSD for your config files everyone gets free editing features (not just validation, completion as well, even completion of attributes)

EamonnMR · on Sept 21, 2023

If you want something editable by hand, yaml is much nicer than Json because you get comments and you don't need to worry about missing a bracket.

mistertoolbox · on Sept 21, 2023

I have no big love for json or yaml, but i'd gladly take worrying about missing brackets over worrying about the handful of whitespace-related gotchas in yaml.

emodendroket · on Sept 21, 2023

YAML gets cleaner diffs when you edit it.

TeMPOraL · on Sept 21, 2023

That's more of a problem with diff tools, though, which like almost all tooling and even programming languages, make the mistake of treating code as plain text.

emodendroket · on Sept 21, 2023

OK, well, if you want to invent a new VC tool that can parse arbitrary languages into a syntax tree and make a useful diff and then convince everyone to use it instead of git then maybe I’ll stop using YAML.

MenhirMike · on Sept 21, 2023

Semantic Diff tools already exist. Haven't looked at ones that specifically support YAML, but e.g., graphtage claims to do so.

vbezhenar · on Sept 21, 2023

YAML is way less verbose than JSON. Something as simple as multiline script is terribly suited for JSON.

baq · on Sept 21, 2023

The problem is when you want to put a script into yaml you should stop and think what it is you’re actually doing.

mplewis · on Sept 21, 2023

JSON doesn't support comments.

whateveracct · on Sept 21, 2023

Nix, Dhall, EDN, any extended lambda calculus are all superior to the languages you mention for config as code.

appplication · on Sept 21, 2023

I’ve felt like Python for config works pretty well, but I understand there’s a whole segment of folks who get twitchy at the thought of Python for anything. But just about every system has it or can install it, and you can do as much config-as-code as you want with it.

RGBCube · on Sept 21, 2023

> If you want config-as-code (and you want to!), just do it properly and use a proper programming language for it. Don't care which one, be it JavaScript, Python, Go, PDP-11 Assembly, or Rust. But please stop with these half-measure DSLs that just don't cut it.

I wish more tooling used Nix, it's a great lisp that isn't ugly.

Y_Y · on Sept 21, 2023

Nix, the language? It's very good for derivations (big, composed k:v maps), but it doesn't feel like a lisp, nor would I want to try using it for docker-compose or a config for pre-commit or an omega one description of an ML model parameterisation.

Then again I'd love to be shown that this partially-considered opinion is wrong.

(Also, isn't Nix's inadequacy as a lisp one of the reasons for the existence of Guix?)

kokada · on Sept 21, 2023

I actually use Nix to generate YAML files for GitHub Actions in one of my repositories. It allows me to share code between multiple actions and be consistent with versions/formatting.

RGBCube · on Sept 21, 2023

That's a really nice use case! Could you provide a link?

kokada · on Sept 28, 2023

Sorry, didn't see this comment. It is here: https://github.com/thiagokokada/nix-configs

In the `actions` directory is all the Nix files. There is some glue code in `lib/flakes` to generate the YAML files from Nix.

otabdeveloper4 · on Sept 21, 2023

Nix is JSON + arrow functions + variables.

The only difference between Nix and a hypothetical JSON++ is that Nix is lazy by default.

I.e., it is nothing at all like any Lisp.

jandrese · on Sept 21, 2023

Also, config files that are also programs are a classic security vulnerability. Fun!

richbell · on Sept 21, 2023

The author of SnakeYaml argued vehemently that a library executing untrusted code by default was a feature that people wanted.

https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-202...

ludwik · on Sept 21, 2023

Wow, this is truly terrifying. It reads as if Donald Trump was the maintainer of a popular library. "Low quality tooling! Sad!!!".

He claims with a straight face that every software using his library only loads YAMLs from sources 100% trusted to execute code. He is given example after example to the contrary, which he ignores instead opting to constantly blame "low quality tooling" for generating "false reports" about his perfect software. People can be really weird sometimes.

sheepshear · on Sept 21, 2023

A lot of people misunderstand what he's actually saying.

There are two categories of constructors. One is for data that should not be executed, the other is for trusted data that should be executed.

There are two libraries. One has default constructors that can execute data, the other has default constructors that don't execute data.

He's saying to rtfm and choose the library with the correct defaults, choose the correct constructor from that library, and stop trying to take away the choice.

richbell · on Sept 21, 2023

> He's saying to rtfm and choose the library with the correct defaults, choose the correct constructor from that library, and stop trying to take away the choice.

Nobody was trying to "take away the choice".

The problem is that you have to explicitly opt-in to be safe. If you followed the code snippets from the README, your application would be vulnerable to RCE without you realizing it; as people pointed out, it would be more secure to have Constructor (safe by default) + DangerousConstructor rather than Constructor (unsafe by default) + SafeConstructor.

His argument was that "100% of applications using SnakeYaml do not accept untrusted data".

sheepshear · on Sept 21, 2023

I understood him to be speaking tautologically, that you trust the data when you choose the trusting library without using the safe constructor, whether or not you realize the implication. He seems very well informed that some people are using the trusting constructors on untrusted data.

As he explained, this library is, by design, convenient by default. Those seeking safe by default should consider using the other library.

"take away the choice" is my summary of several comments that would have the feature removed. One was about how its existence is a vulnerability if file access is compromised. Another was about how code execution is not in the spec. And so on.

richbell · on Sept 21, 2023

> I understood him to be speaking tautologically, that you trust the data when you choose the trusting library without using the safe constructor, whether or not you realize the implication.

He was speaking literally; he even rejected several of the provided examples because "users have to login first, therefore the data is trusted" which isn't an argument that any security-conscious person would make.

> As he explained, this library is, by design, convenient by default. Those seeking safe by default should consider using the other library.

That is a negligent mindset to have. Log4J added the ability to execute arbitrary dns and ldap calls for the sake of convenience, which resulted in one of the most consequential vulnerabilities of the past decade.

Opt-in security is dangerous and should never be the default — especially when the feature in question is executing arbitrary input.

sheepshear · on Sept 21, 2023

He also said to sanitize any data that you intend to use with the unsafe constructors. Taken together, he's pointing out that you decide how much you trust the data and you control which constructor to use. "Problem in chair"

"Should" statements are always relative to what you value. Clearly he thinks this trade-off is fine for him. His other library accommodates your security needs but this one accommodates his convenience needs. Can the man not make something for himself?

I assume it would be costly for him to make and propagate the changes. Maybe money could persuade him.

richbell · on Sept 21, 2023

> He also said to sanitize any data that you intend to use with the unsafe constructors. Taken together, he's pointing out that you decide how much you trust the data and you control which constructor to use. "Problem in chair"

That doesn't change the fact that it's a poorly designed API that's insecure by default. There are countless situations where people are inadvertently exposed to risk via transitive dependencies, at no fault of their own.

> Can the man not make something for himself?

He did not make it for himself, he made it to he consumed by others. SnakeYAML is a widely used package.

sheepshear · on Sept 21, 2023

He said he designed it for his use case of executing trusted configuration code, which some others appreciate. Obviously there was some misunderstand about the goals and priorities of the project.

Making this change would cost him something he values without giving him something else he values in return. According to him, Snake Engine already provides a default safe solution, so he's not leaving anyone without a remedy. It would cost you to switch, but you would get something in return. That seems fair to me.

sznio · on Sept 21, 2023

If you don't trust the data you're supposed to use SafeConstructor.

It's like blaming the browser vendor for an XSS vulnerability rather than the backend implementation not using HtmlEncode.

zkldi · on Sept 21, 2023

why isn't the default secure? if the default isn't secure we have learned time and time again that people will use the default unknowingly exposing themselves to security holes.

Here's just a couple examples off the top of my head:

- `$variables` in bash are subject to arbitrary code execution via word splitting without escaping

- PHP register_globals

- PHP, express, and some others parse `?a[b]="foo"` in a query string as an object, allowing for prototype pollution or other exploits

- string concatenation for SQL + escape_string being the default for years

- perl array expansion in function calls

- XML entity inclusion on by default allowing you to read arbitrary files

- log4j executing arbitrary code inside its logs

- passing a variable to printf's first arg

- no difference between escaped and unescaped tags in php

- xargs splitting on whitespace

- yaml allowing arbitrary code execution (it got rails good!)

and there's probably loads more.

lodovic · on Sept 21, 2023

wow, what a read. I'm now convinced to immediately remove that library if I ever come across it.

TeMPOraL · on Sept 21, 2023

Well, let's be honest: security sucks out all the joy from everything. Even before freedom, the first victim of security is fun.

jandrese · on Sept 21, 2023

Otherwise phrased as "this is why we can't have nice things".

legends2k · on Sept 21, 2023

True but that's to be tackled differently instead of not providing scripting at all.

AtlasBarfed · on Sept 21, 2023

XML does not, in a single line of code with no preknowledge of the document, deserialize into a map or array (of nested maps/arrays as needed). It cannot be mapped easily into domain objects/datastructs without extensive mapping info.

Instead, you need to describe the structure of the XML, have preknowledge of prefixes and meanings for namespaces, have to deal with CDATA crap, have directives, config-in-comments, and hosts of other annoyances.

XML sucks. I programmed from 1995 to the present. XML sucks. YAML is far far far far far superior.

The ONLY good thing about XML is XPath. That's it! XSLT? awful. schemas and other validations? horrid.

XStream (java library) was the only thing that made XML usable, and the second JSON (and later YAML) came out, I dropped it immediately.

lolxmlhateonhn · on Sept 21, 2023

That’s called “schema.” It’s a pretty out there concept, I know, the idea that you should be burdened to document the structure and intent of your data for both human and mechanical consumption. I realize I’m being forceful here, but keep in mind, you are compensated incredibly well. If it takes you another hour to save ten down the road, earn the pay. I don’t understand this aversion to tough stuff - which seems to be pretty popular here - and I’m starting to think I should interview for it a bit harder than I already do.

The problem with this thinking is that you, personally, are then forbidden from arguing for the use of a strictly typed language for development because it’s the opposite position to the one you’re holding here. The exact reasons we use languages like those are the same reasons we should be explicit with our schemas. It’s unfortunate that many people try to argue both sides due to the convenience, as you say, of a single line parse, when years of experience has taught that duck anything is a bug fountain. (Not saying you are arguing both, by the way, it’s just common.)

Try reading back your gripe with the following in mind: do I have a stronger complaint than “it’s difficult” here? I think you’ll find that you don’t convey one effectively.

throwaway69467 · on Sept 21, 2023

I have seen the same aversion and lack of pragmatism so many times it has started to impact my motivation.

Examples:

1. When taking over a project, developers glanced at the code and decided it would be better to spend 6 months rewriting from scratch. The end result was not more readable than the original solution and introduced a new set of issues.

2. Many put too much emphasis on the worst case scenario and do not consider the average case. I worked a lot with many different XML formats and most of them were OK. Not "fun", but simply OK. I have to admit that I did struggle with some complex files, but there were plenty of times where the XML was simple, readable and easy to work with

3. When comparing programming languages they often focus on a few features and don't think about productivity in general. Languages like Java can actually be very productive, even if your favorite language can reduce null checks.

esbeeb · on Sept 21, 2023

There's a happy medium to be found here - balancing ease of use, while avoiding the bug fountain. Having said that, IMHO we should err on the side of avoiding the bug fountain. Make it as simple as possible, but not simpler.

Robbing the future with deceptive over-simplicity - by creating a bunch of future difficult debugging scenarios and possibly footguns - is the far worse evil, than missing out on the maximally-convenient onboarding (which can be foolishly optimized for, for the sake of short-term popularity). All such crap-tastic solutions will eventually need to be replaced again anyway, creating an endless, hellish, slow churn.

lolxmlhateonhn · on Sept 21, 2023

I like rigid schemas for write and supporting both loose and rigid reads. Generally the friction in a system like that is something like the ad hoc SRE trying to load up a type stack just to interpret a protobuf log. Or Tableau. Stuff like that. That’s where people get annoyed. I think you’re right and there’s a lot of unexplored directions of simplicity.

Computers that understand the shape of your data are very helpful friends when you’re pursuing goals like data locality.

sgarland · on Sept 21, 2023

Agreed. This is the same mentality that brought us MongoDB.

"Don't want to have to create a schema? Normalization is confusing? No problem, just chuck JSON in here instead!"

unscaled · on Sept 21, 2023

The problem is that XML doesn't even map that well even to objects and classes, even with annotation. And XSD is quite a heavyweight format with terrible UX.

I'm pretty much consistently on the strict/type-safe side of the "should we have a schema" debate, but there are better options out there to maintain a consistent schema for either data interchange.

JSON is simpler to map, faster to parse, simpler, more lightweight, and less dangerous to add to an online app[1]. You can also use a schema like JSON Schema for inter-app compatibility. It has replaced XML as the standard data interchange format for a reason. It's not great for configuration files, and it's definitely being overused nowadays, but it is a solid data interchange format.

Then you've got binary formats like Protocol Buffers which are even more lightweight and faster to parse and (generally) have schemas that map better to typed languages.

I think the OP has it right: XML is very well-suited as a generic document format. I wouldn't compare it to YAML, because in a perfect world they shouldn't compete in the same categories: Nobody should use XML for configuration or YAML for documents. And I also agree that there are better formats than YAML. I like the ease of writing indented multiline strings in YAML, but the fuzzy typing is pretty terrible. At least YAML 1.2 fixed the Norway problem.

[1] https://en.wikipedia.org/wiki/Billion_laughs_attack