Hacker News new | past | comments | ask | show | jobs | submit login

Sexps can be nice for a config representation (in particular, they imply less structure so they work well for more cases like maps between objects (just use an alist), little DSLs in configs (so long as you’re ok with a lisp-like language), and “tagged objects” where you want to specify what kind of thing you’re describing and then extra arguments, like enums in rust). There are problems though:

1. Casing (some lisps are case sensitive others aren’t)

2. Lots of types of atom. Eg Common Lisp has like a dozen different types of number (short/long/single/double floats, integers, rationals, complex numbers thereof), weird syntax (eg put a decimal point at the end of an integer to make sure it’s read in base 10. The fact that you have to worry it won’t be is already a serious concern), strings and symbols.

3. Extra syntax/types, eg vectors, arrays, bitvectors, #., backtick and comma, quote, backslash rules (but not sure if there’s a standard way to escape characters), keywords, packages

4. Multiple similar things, eg vectors and lists, symbols and strings, alists and plists.

Many of these qualities may be useful in programming but if one treats a config representation as a thing which must be validated and parsed into the actual data, then all of this adds confusion. I think there should be only one kind of atom: the string which may be written without quotes. This way you can be flexible in parsing (some fields you might choose to parse 50% as 0.5 and other fields you might require that the number starts with a dollar sign so people don’t forget that it is referring so some amount of dollars) and make it easy for people to write config files (no errors about expecting a string and getting a symbol or vice versa).




Following the $ thing for amount of values, a thing I like in Common Lisp is the way the reader can be customized, and maybe this is a bad idea but I would like to have user-provided data-types formats and bind them to prefix characters, or pairs of delimiters, ... to avoid having to tag your values like this, as done in JSON:

    (published 
      (type std.iso.8601
       value 2020-02-26))
Or like this, where "date:" is something only your application knows:

    (published date:2020-02-26)
Instead, you could rely on an externally specified format

    (prefix @ is std.iso.8601)
And use it in your file to parse text so that your application can build values of the proper datatype:

    (published @2020-02-26)
The application language would register lexers for those formats or fail the parsing step.

You could have fake parsers that just skip over the defined syntax if you don't need to process it in your code. Or, the way the syntax is defined could be such that it tells the lexer how to skip over a token even if the type is not useful for a tool's purpose (skip until a space, or parse exactly N characters, or "read until this delimiter with backslash being an escape character").

(probably reinventing the wheel)


My preferred solution looks like:

  (published 2021-02-26)
The program will get a list of two atoms, “published”, and “2021-02-26”. It can complain if you’ve eg written “published” when you should have written “submission_date” and it can complain if the next atom isn’t a valid date. You don’t need to tell the reader to parse something as a date because the reader isn’t best placed to know what should and shouldn’t be a date. Whereas the program should know exactly where it expects dates to be so it might as well handle parsing them so you don’t need to tag dates when you write your config.

Any human can read that date and know what it means so why bother tagging it if the machine can also know it should be a date.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: