I think that For c-like syntaxes that have mostly `(, , )` and `; ; ;`patterns -- i.e. start (opening) and end (closing) delimiters with tokens inside separated by separators -- the most rational formatting is really to push *all delimiters to the left and let the closing delimiter dangle*.
Indeed, feels super unusual and tooling is lacking, but behold the concise rationality:
Beauty. N.B. those opening and closing delimiters "connected" by breadcrumbs of separators. Isn't it lovely?
Most developers I know will immediately lit torches and grab pitchforks if you show them this.
(Personally, I'm using it sometimes for private doodling, and while reformatting obfuscated or exploring some complicated codes; mostly just to improve own reading comprehension.)
If you relax the rule to allow "single simple value" to be in the same line (so break the rule for "=" / ":" on the start of the line), it could look lie this [1]
If the syntax allows for a trailing separator, I'd go with opening delimiters and separators on the right, with a trailing separator, and the closing delimiter on its own line.
The reasoning behind that is that it produced the cleanest diffs with source control - especially if you consider changing the first/last/only entry in a collection.
Consider the diff from removing the first phone number from your collection above, and compare that to removing it from
Rational in sense that syntactically most significant characters are swept to one side:
- left.
So they can be observed (checked) in a single glance.
In contrast of the conventional ("irrational") zig-zag pattern where:
- starting delimiter is somewhere in the middle of the line (if not in Allman and similar style);
- separators are mostly at the right end of the line, sometimes pretty far, depending on line length;
- closing delimiter is mostly on the left.
- Plus separators and delimiters can be scattered among single line, if they "fit" there.
Again, I'm not telling this is "good and YoU ShoULD USe thAT!1!", only that compared to rules of other conventions I see least amount of "rules" and "exceptions" in this. And it is usable only in syntaxes with separators (JS to some degree, JSON, CSS), not for languages where ";" must finish even the last statement in a block (PHP) - there you'd have dangling two characters.
As for readability, I concur, yet I'd be very cautious of raising judgments like "it is utterly unreadable". I know that readability is largely matter of habit and I'd not be surprised that if we lived in such "Haskely" alternative universe where this convention was a norm from the start, we'd probably scream in terror when confronted with "Prettier" ("Stroustrupy") conventions.
I'm not sure about the metric of "syntactically significant", though. Semicolons are definitely important for the compiler, but they're usually very unimportant for the human writer and reader of the code - if they are present, they can be ignored, and if they are not present, then the IDE can indicate that fairly reliably, regardless of where in the code it sits. When editing, they are rarely particularly present in my mind. And given that for the compiler - the only entity that does need to be aware of them - it doesn't matter at all where they go, it doesn't seem useful to emphasise them in this way.
It makes sense, but I really think it is a matter of habit. You can similarly "ignore" those separators and delimiters where you have them now as you can "ignore" them on the left. (There you can for example mentally "blend" them with indentation.)
That "emphasis" you are talking about is probably just that they seem unusual so they draw attention.
Elm typically leans into this exact style (though without semicolons)! It's refreshing seeing support for this delimiter-first, dangling closing delimiter style in a more imperative and mainstream language.
If you make `; `
into
`;\n`,
it if perfectly readable. When trying to sort, where a scope, call or line actually ends, I like to do that, too.
But more git friendly, i.e. either split
`: [ "212 555-1234"` in two, or move colon up and often add a commented out `//,` for completeness at beginning or end if not allowed
I am unsure who this is for. An argument for this seems to be that "delimiter-first" code is easier to machine parse but at the same time author says that tools can be made to overcome hard to parse code.
The a) b) c) and 1) 2) 3) argument seems just confused - those are not delimiters.
The author claims that we don't need a terminator - but what do you do in case of two subsequent sequences ?
Author says that YAML is "delimiter-first", yet there is no \n at the beginning of a YAML file.
In the end it all comes down to putting an extra comma in front of everything you write at which point you might as well just parse everything backwards to achieve the same result.
Once I take out all the circular logic out of the theory I simply fail to see any benefits of this approach.
It certainly can be a delimiter, and indeed is often how we delimit sequences in normal language texts.
I've been experimenting with how this type of syntax could be used to transcribe Lisps, following on from the like of 'Wisp'. The results are fairly interesting, in my opinion, though I've mostly been fiddling with the feeling of writing and reading so far, rather than actually settling on a syntactic combination that's coherent. I'm using bullets rather than numbered lists, since sequence is basically always implied in code context.
> The a) b) c) and 1) 2) 3) argument seems just confused - those are not delimiters.
Hmm... I don't know about that. They fit many of the definitions I can find online:
> A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. -- https://en.wikipedia.org/wiki/Delimiter
So the ) is the delimiter between the list counter and the item text, but the counter itself has meaning (indicating order/priority/importance) so I'd never consider that part of the delimiter.
I thought the same way before looking into it. The only context I ever see the word delimiter in is "X-delimited values", where X = comma, pipe, tab, etc.
Yup, it is a very confused text without any real point to it.
Edit: Have to revise that, it made me rethink something: maybe I should throw endOfLine as a token out of my parser generator engine, and instead introduce a startOfLine token.
Code formatting should be a non-issue. The way it should work is that when loading a file, your editor formats the code the way you feel most comfortable working with it. On save, it should be formatted according to the guidelines of the group who work together on the code (company guidelines).
Of course, that means that sometimes you have to look at code in the latter while you prefer the former. For instance, when you examine a merge conflict. But for the most part, in your everyday programming work, you only see code formatted the way you like.
Of course, this assumes that it's possible to programmatically translate between arbitrary different code styles. This is probably true when we're only talking about differences in whitespace, but the story become quite different when we also consider things like naming conventions (different case styles (underscore vs. CamelCase, first letter small vs. first letter big, etc.) and other things.
Not sure if it's still the case, but the Linux kernel used to have a script checkpatch.pl that was a few thousand lines of Perl, dense with regex, to check that patches conform to kernel style guidelines. And those guidelines are basically just K&R C style.
When I was a teenager I really wanted to get into contributing to the kernel, so I did this babby's first patch, fixing the code style for this rather large, extremely messy 3rd party driver in the tree. Must have taken me a couple days; hardly any single line conformed to the guidelines. Sent the patch in and Greg KH replied saying thanks and all, but we were planning on just deleting this driver anyway, so no merge.
Somewhat embarassing intro to open source contribution :D
No idea what they do in the kernel now, but back then there was no automation, just that giant perl script that told you all the places where the style diverged from guidelines.
I don't know about the Linux codebase either, but a lot of languages these days tend to have an CST-based formatter - given a syntactically correct but arbitrarily formatted file, it will essentially rewrite it into a standardised form. There's a lot of cleverness going on on top of that - using the context of the code to format things in different ways - but the core idea is generally to parse the code and then format that, rather than apply purely text-based rules.
The benefit of this is that it almost cannot go wrong (unlike older formatters which might stumble over a line of code and accidentally write invalid or incorrect code). The disadvantage is that there are usually fewer configuration options, so if you don't like the formatting, you might just be stuck with it anyway.
Meanwhile, Go has essentially solved the problem, and Rust copied the idea. Neither of those languages need to bother humans with the details of exact code layout. The post-C world is a lot better.
To expand further out: the reason this is a problem to begin with is that virtually all programming languages involve a compiler or interpreter parsing a textfile which humans (with the aid of IDEs) must laboriously maintain.
Instead of arguing about code format styles, why not change our languages (or at the very least development environments) so that a list (for example) gets parsed, analyzed, and turned into an internal object, and doesn't remain merely a "dead" string of characters.
Once lists are "objectified", we should be using tools to modify those objects (similar to how you'd add/remove rows from a spreadsheet, or edit the DOM in your browser's dev tools). How those objects get presented visually to the programmer as source code is irrelevant at that point: Want to show the list a a single line of items concatenated with commas? No problem. Want to see each item on its own line, without commas? Sure. The visual representation of lists just become syntactic sugar, while the actual list itself is embedded in the development environment as an internal object, much as your HTML pages become parsed by the browser into the DOM which you can inspect and modify via the browser's dev tools.
We should be coding to build "smart objects," not writing strings of text to be endlessly curated.
The way it should work is that when loading a file,
your editor formats the code the way you feel most
comfortable working with it.
This doesn't seem practical if you have to reconcile external sources of information with the file - for instance, if you're looking at line numbers in a stack trace from production.
In principle even that could be solved with better tools (source map files and the like), but I agree, until such tools are universally adopted we're probably stuck with having to work with and store the same formatted code.
As it is though correlating line numbers from production logs with current source is often an inexact science due to optimisations or difficulties retrieving the exact same version of the code that was used to produce the build (either because you can't find out the exact build number, or because the correlation between commit hash and build number got lost somewhere, etc.).
That particular case (line numbers) is solvable quite easily in theory. But like I mentioned before, there are other problems - if the automatic reformatting includes more than just fiddling with whitespace.
I've done SQL with an s-expression syntax, and found it at least as easy to read as normal SQL syntax. Do you have an example that would be harder to read?
I think the main issue is that talking by trivial examples does not communicate a coherent syntax. The audience can imagine completely different corner cases when they fill in the gaps differently than how you intended.
I don't think your example looks very s-expressiony with the FROM keyword in the middle of the sequence. To me, s-expr means the keyword in the first slot labels the form to interpret the rest of the sequence. The outer form should either have positional slots to implicitly interpret each part by different rules or an unordered set of slots that will each have their own labeled form. Consider instead something like:
Also, existing SQL syntaxes have some places where parenthesis has a special syntactic purpose and others where it is optional. It is rather convoluted to define unambiguous parses for some of these. For someone familiar with any of this, it would be hard not to bring in false assumptions about how a hybrid language would work that brings in parentheses for other structuring purposes.
How is that non-delimited? Spaces and parentheses are still delimiters. Non-delimited would imply fixed-width tokens, binary machine language would be one of the very few that qualify.
Comma-first is absolutely conventional in many parts of literally millions of C++ files, particularly for member initializers in constructor definitions, but often, also, argument lists.
It makes for cleaner edits and diffs.
If you are designing a syntax that does not rely on line breaks like Python, reach for "introducers" instead of "delimiters". Commas make awkward introducers even though people can and do get used to it.
I write a lot of verilog code, and most of its operators are the same a for C. Your first example doesn't highlight the real benefit of its style because all of your items are the same length. Here is a better example:
I tend to use that sort of layout, particularly the comma-first example shown in the article for SQL and JSON. It makes moving things around less error prone (no need to worry about where the ; is then you reorder or add to or remove from the end of a list) and makes diffs cleaner between versions for the same reason. And I feel it just seems clearer. Some people seem to hate it with a passion though.
The issue with that layout (and with my examples) is that the first element is the odd one, and you can't reorder it. I think it's better this way (usually the first different is special, and also you usually add things at the end) but I prefer a trailing comma over the comma at the left. A leading comma, like the article discuss, would be an nice addition indeed
I'm still annoyed that lists have delimiters at all. i.e. why not:
[ 1 2 3 ]
In particular syntaxes that require a delimiter and forbid a trailing delimiter require much more verbosity in code-generators.
This is maybe the only place where bash differs from other syntaxes but (IMO) clearly got it right.
I'm also reminded that Clojure uses a compromise where there are no delimiters, but a comma is treated as whitespace.
[edit]
Also this comes up from time to time in the Nix language; I was complaining with someone and we both were annoyed by the inconsistency; lists use no delimiter while sets (i.e. dicts) use a semicolon. I thought that sets should have no delimiters, he thought that lists should, but we both agreed that the inconsistency was jarring.+
“sequential tokens are implicitly list items in a list context” (lisp family, REBOL, and some others) works if its not clashing with something else, like “sequential tokens are implicitly function/method calls” (Ruby, Haskell)
Of course, if sequences are callables that reproduce the original sequence with the argument appended, then these kind of merge, but with empty list as a prefix instead of list bracketing.
strangely enough Nix uses both of those interpretations for sequential tokens, requiring parenthisization of function-calls inside a list context. Truly a language with syntax only a mother could love.
It's not "significant whitespace" in the sense that it is usually used.
Few complaining about significant whitespace think:
foo bar = 3
Should assign 3 to a variable named "foo bar" because otherwise it's significant whitespace. The lexer already splits tokens on unquoted whitespace for most languages, so requiring a comma in lists solves ambiguity.
The only real argument against it would be that this looks confusing (though it's still unambiguous!)
[ 1 2 + 3 4]
so one would either have to live with that or parenthesize multi-token expressions in a list. Note that due to Nix's weird function syntax, parenthesizing function-calls in lists is already required there, but c-like languages don't have this problem.
People who hate on “significant whitespace” tend to have a fairly arbitrary set of conditions for which whitespace being significant, and in what way, bothers them.
> Argument 1) is irrelevant since tools can handle any notation, even completely non-readable for human. Argument 2) is weak, however similarity to known things drastically simplifies adoption.
Argument 1 is not about whether tools can handle the comma-first format well or not (which is what the counter-argument focuses on).
Instead, the argument that tools can themselves solve the problem that comma-first format is used to work around (so there's no need for it).
That is, the "anti-comma-first" argument is not that "we shouldn't use comma first because tools can't easily handle it".
It's "the thing people attempt to solve with comma-first, namely not accidentally forgetting a comma, tools like linters can warn us about, so it's not something we should change our natural formatting to manually solve".
Whether the marker is leading or trailing is orthogonal to whether the first/last marker is optional. And usually you want to have it optional in either case so that e.g. you can write `f(arg)` and don’t have to write `f(,arg)` or `f(arg,)`.
It depends on your system of parsing. For example, code generated by a parser generator would be slightly more complex. A regular expression would also be slightly more complex.
I have written recursive descent parsers in the past, and if I remember correctly, the way I set things up, I had to handle leading list elements specially. It's possible that this was unnecessary and I was simply blinded to a simpler way of doing things because I was following the EBNF too strictly.
Edit: I noticed that we're not actually accounting for an empty list here. That makes things slightly more complex:
comma-separated = [ element, *(",", element) ];
Your code would need to test if there is an element or a list terminator before processing the element.
That's how I do it in Civet (the CoffeeScript of TypeScript). It's a very small amount of complexity to add to the parser for a ton of convenience when using the language. https://github.com/DanielXMoore/Civet
I know and it works on multiple lines too. However I never use it. Requirements change, refactoring happens, one of those literals becomes an interpolated string and all of a sudden I have to rewrite all the array. Furthermore %w() is just not nice to look at IMHO.
The first comma asks the data to confirm comma-separatedness, the second acknowledges the confirmation, and the third is the actual payload for the comma separator.
(And let's just go ahead and disallow trailing three-way-handshake commas for fun. :)
So what I’m arguing for is having a start-of-item token. Like this: ・1 ・2 ・3. Do we need to point an end of last token? As we’ll see next, that’s usually not the case.
We have a special word for end-of-item token: terminator, but no startinator or any similar word. I see some irony in this.
"Starter" appears to have more dynamic/active connotations, where initiates and terminates has those same meanings but also seems to have a passive meaning matching terminator. But really, the terms are similar enough that my only reason for opting for "initiator" was the symmetry which extends to related words (e.g. initial/terminal, initiate/terminate)
Mind blown, this is really a new way of thinking about structuring text. At least I'd never considered it before. It was worth reading just for the mind expansion.
Delimiter-first thinking leads to better code when you need to write a loop to produce some separated list.
for (x in items; let comma = false) {
if (want_to_print(x)) {
if (comma)
print ", "
print x
comma = true
}
}
A delimiter-separated list should be regarded as all but the first item being preceded by the delimiter, rather than all but the last item followed by a delimiter.
If you that the latter view, you write silly code which tries to guess whether more items are going to be printed in subsequent iterations, and thus whether to generate a trailing comma now.
I think, at least for the list-like examples, this is missing an important aspect of what delimiters ‘mean’ to humans.
The delimiter goes directly after the items because it means ‘but wait, there’s more’, and so mentally puts you into a mindset of being prepared for the next item. If list-like delimiters are placed at the start of lines, the reader is forever in a heightened state of ‘is this thing over or not’ as they go on to the next line.
I’m not arguing that this is world-ending or anything, but I do think it’s what makes me feel ‘uneasy’ when reading the examples.
Many of my colleagues in India write comma-first column expressions in their SQL queries because if they have to add a column, it's a one-line edit, which I'll admit is nice when looking at a diff. But I just can't bring myself to write like that.
Type statements first, in white-space significant languages, are great too. They allow for type based DSLs to be really cleanly included. Think gremlin, SQL, or jsx. Multiline strings are very neat too.
This all comes down to design decisions on a higher level - why arent your elements fixed-size? Why are you using delimiters at all, instead of a [size|content][size|content] kind of layout? You dont need the delimiter if you know, for each element, how big it is, which also tells you where the next one starts. No more magic delimiters, and no find, substring until, etc.
I do this sometimes because it makes line comments (to comment out a particular line) much easier. Same with logical operators (and/or). Thought this was pretty common in some languages.
But most code consists out of control statements: either "def" for defining stuff, top-level assignments. or control-flow in method bodies: local variable assignement, if/else, for/while, return. The proposed notation only simplifies argument passing to called functions, which is, IMHO, not a problem that is needed to be solved
Indeed, feels super unusual and tooling is lacking, but behold the concise rationality:
Beauty. N.B. those opening and closing delimiters "connected" by breadcrumbs of separators. Isn't it lovely?Most developers I know will immediately lit torches and grab pitchforks if you show them this.
(Personally, I'm using it sometimes for private doodling, and while reformatting obfuscated or exploring some complicated codes; mostly just to improve own reading comprehension.)
If you relax the rule to allow "single simple value" to be in the same line (so break the rule for "=" / ":" on the start of the line), it could look lie this [1]
[1] https://eldar.cz/myf/lab/_sandbox/indentation.html