I can't help myself, the required semicolon at the end of each line in Dart programs keep hurting after writing for many years in Python, JS, Kotlin and Swift. I grew up semicolon-less with QBasic and learned semicolon since Delphi 1.0 and Java 1.0 but today, wanting as few noise on the screen as possible, they are hurting in my eyes.
I'm the author of the post and work on the Dart language. I spent a bunch of time last year trying to figure out a good way to make semicolons optional in Dart without breaking a lot of stuff.
Because of a handful of syntax quirks particular to Dart, it's really difficult. Most other languages that have optional semicolons were designed from the start to make them optional. Dart was not. On top of that it:
* Uses C-style variable declarations which means there's no keyword to indicate the start of a variable declaration.
* Likewise uses C syntax for function and even local function declarations. Again no keyword marking the beginning of a function.
* Has complex syntax for things like function types which means a type annotation can be arbitarily long and may split over multiple lines.
* Has lots of "contextual keywords" that are not true reserved words but behave like them only in certain contexts.
* Has a rich syntax that overloads a lot of tokens to mean different things in different contexts. "{}" can be an empty map, empty set, empty block, empty function body, or empty lambda body.
All of that makes optional semicolons just really difficult. I haven't given up entirely. We have a thing now called "language versioning" that lets different Dart libraries target different specific versions of the language. That lets us evolve the language in nominally "breaking" ways without actually breaking existing code. (For example, this is how we will migrate the world to non-nullable types.)
That may give us a way to make other grammar simplifications that would then also let us make semicolons optional. But syntax changes like this are always difficult. People have very strong opinions about you changing their existing syntax under them, and the value proposition in return isn't super compelling.
There's an old saying, the best time to plant a tree is twenty years ago. The second best time is today.
Language syntax isn't always like that because the migration cost once you have a lot of extant code (and users) can just be too painful. The best time to make semicolons optional was twenty years ago. The next best time may be today, but it may just be never.
Could Dart use this solution, which Python uses and Oil shell now uses?
1. Newlines are significant tokens and they behave like semicolons (this is how shell is)
2. Within () [] {}, newlines are suppressed. The lexer has to recognize the matching pairs. (Oil added an expression language to shell which does this)
This means that
var x = 1 +
2
is not valid because there's a statement terminator after +, but
var x = f(1 +
2)
is valid. I feel like this captures the fast majority of cases where you want to break a statement over lines.
I don't know Dart, but since it seems to have JavaScript-like syntax, I don't see why that wouldn't work? JavaScript has some weird rules, the common gotcha seems to be is:
return
{
} // oops this does not return a dictionary like I thought
With those rules you would write:
return {
}
and hopefully the former is a syntax error, not a mis-parsed program.
----
edit: OK my guess is that Dart probably doesn't want to enforce a brace style like Go (and Oil) do?
That is in Go only
if (x) {
}
is valid, rather than
if (x)
{
}
which is pretty much like the 'return' example. I have been using that brace style for so long that I forgot other people don't :)
var map = {
key: long
- value
}
// block
{
long
- value
}
The newlines should be ignored inside the map literal, but not inside the block. A correct semicolon insertion for this should be:
var map = {
key: long // <-- no semicolon here
- value
};
// block
{
long; // <-- but there is one here
- value;
}
But the lexer doesn't know when curlies are maps and when they are blocks. The parser does, which means you could theoretically define the semicolon insertion rules in the grammar, which is what we'd likely have to do, but that makes it much more complex.
Ah OK, that makes sense. So Python doesn't have that issue because it doesn't overload braces for hashes and blocks.
Oil does overload them, but it has a separate lexing mode for statements and expressions. It switches when it sees say the 'var' keyword, so the right of var x = {a: 3} is lexed as an expression rather than a statement. This sort of fell out of compatibility constraints with shell, but ended up being pretty convenient and powerful.
The lexing mode relies pretty strongly on having the "first word", e.g. 'var' or 'func', Pascal-style. So yeah I can see how it would be more complex with Java-style syntax.
> So Python doesn't have that issue because it doesn't overload braces for hashes and blocks.
Yes, and also because lambdas in Python can only have expression bodies, not statements. That means you can never have a statement nested inside an expression. This is important because Python's rule to ignore all newlines between parentheses would fall apart if you could stuff a statement-bodied function inside a parenthesized expression.
Yes the lambda issue is something I ran into for Oil. Although this part of the language is deferred, the parser is implemented:
# Ruby/Rust style lambda, with the Python constraint that the body can only be an expression
var inc = |x| x + 1
# NOT possible in Oil because | isn't a distinct enough token to change the lexer mode
var inc = |x| {
return x + 1
}
# This is possible but I didn't implement it.
# "func" can change the lexer mode so { is known to start a block.
# In other words, { and } each have two distinct token IDs.
var inc = func(x) {
return x + 1
}
I think this is a decent tradeoff, but it's indeed tricky and something I probably spent too much time thinking about ... the perils of "familiar" syntax :-/
> Could Dart use this solution, which Python uses and Oil shell now uses?
This works well for Python, because the following causes IndentationError:
x = 1
+ 2
In a language without significant indentation, would this be two statements, an assignment and a separate unary `+` expression? Is that OK?
If not, one possible solution would be to disallow arbitrary expression statements. AFAIK this is what Go does.
> JavaScript has some weird rules, the common gotcha seems to be `return \n {...}`
Disallowing expression statements also fixes this particular example, but it doesn't solve the `return` problem in general. It's not practical to disallow standalone function calls, so `return \n f()` still has the same issue.
Lua solves this by enforcing that a `return` must be the last statement in a block.
> It's not practical to disallow standalone function calls
If your language structurally differentiated procedures (i.e., void-returning functions) from (other) functions, it would be practical to disallow standalone functions (not procedures), though you might have a modifier to explicitly discard the result of a function that allows it to be standalone.
There are benefits to unambiguously marking the end of each statement with a semicolon rather than just using a newline. Is there a good algorithm for determining whether or not a newline actually represents the end of a statement?
JavaScript gets this wrong in both directions, sometimes unexpectedly ending a statement (e.g. in `return \n 0`) and sometimes unexpectedly not ending one (e.g. when a new line begins with an open-parenthesis).
Python's method (newline always ends a statement unless it's inside (), [] or {}) is straightforward, but makes the language syntax strictly line-based. This matches Python's significant indentation, but can it work in a language without it?
Another option I've seen is for all newlines to end statements, unless they follow a token that cannot end a statement. Unfortunately that means that the following two assignments have different behavior:
foo = bar +
baz
foo = bar
+ baz
The first is a sum, the second only assigns `bar`, followed by a standalone unary `+` expression. Go works this way, but considers the second form to be an error ("+baz evaluated but not used"). Python considers both of these to be errors.
True, and these kinds of bikeshedding discussions about tiny details are infuriating because they're so irrelevant. I wish we could all rise above them to discuss the next level of expressibility.
We let the little stuff suck up so much of or time.
It’s not that they hurt the eyes for me as much as they introduce tons of micro delays and extra keystrokes when editing. For instance, any “X to end of line” (cut, copy, etc), you now need to think about what the destination is and whether you should include the semi, change the selection, go back and delete the semi later, etc.
It sounds slight, but it really builds up over time. Once you get used to “powerediting” without them it’s hard to go back.
End of line inference algorithms are incredibly fragile, both by the compiler and by the human reader, having a semi-colon can make things clearer in the absence of another secondary signal (like indentation). There are so many type errors in scala or typescript that can be solved by adding a semicolon somewhere to eliminate some idiotic ambiguity.
Only-EOL like python is fine and always-semicolon like C is fine.
Sometimes-optional but sometimes-required semicolon like
Javascript is bad in my experience. Javascripts C-like syntax
doesn't cope well with missing semicolons
> End of line inference algorithms are incredibly fragile
I am working on a language right now with optional semicolons (that is, newlines terminate statements unless there is a semicolon, which does the same early) and it’s not really that hard. The only change is that instead of looking for a newline character for statement termination I often also need to look for a semicolon.
> newlines terminate statements unless there is a semicolon, which does the same early
Since I'm writing a toy language with optional semicolons, I'm curious - how are you solving ambiguities in multi-line expressions? For example:
1
-1
Does your language not allow multi-line expressions, treating the above as two statements? Or, perhaps they must be explicitly an expression, like:
(1
-1)
I agree with you on another comment, that JavaScript's automatic semicolon insertion is complicated - so I'd like know if there's a sensible solution for languages without semicolons.
Ah, I see, so new lines are like invisible operators (with low precedence) that ends a statement unless preceded by another operator. Right, that makes sense.
Python behaves "sensibly" with regard to this. Semicolons are statement breaks. Endlines are statement breaks, unless theres some reason for them not to be, such as being in a open brace state.
Why not infer the end of a line when finding the end of a line? ;)
This is what most BASICs and assemblers did.
(I think QBasic had a line continuation character? - or maybe that came with Visual BASIC. I don't remember. But, anyway, if you wanted, it was possible to continue a source line past a line break, it just wasn't the default.)