Is that really some "special behavior"? As far as i understand the semicolon is just an expression separator like in Erlang and Pascal. In contrast, the semicolon is a statement terminator in C. C uses the comma for expression separation. Practically everything is an expression in Rust, even blocks, so
foo() => evaluates to the return value of foo
{ foo() } => evaluates to the return value of foo
{ foo(); } => evaluates to nil
The last case is not a special case. The last expression in the block determines the returned value. There is an expression separator in there, so there must be two expressions which are separated by it. The first one is foo() and second one is ... wait for it ... the empty expression. Which value should EmptyExpression have? Of course: nil, which would be called void in C-land.
There was some discussion about doing this in D, and there was some in C++11, too. I argued that the presence or absence of the ; did not visually stand out very well, and so would be a source of confusion and errors.
Hence, ; remained as a statement terminator, and the return keyword served to indicate returning an expression.
However, the D lambda syntax does not require a ; when the lambda body consists of only a single expression:
Ignoring a potentially wanted return value with ";" is a hazard. I'd like to institute some set of "warn-unused-result" warnings in Rust to combat this.
This is correct for any "normal" (a.k.a. "item level") function, since their signatures must always be explicit. But you're allowed to infer the signatures of closures, which is the only place where this problem could arise.
// This function lives at the top level of the module, so its types must be annotated
fn foo(bar: int) -> int {
// Here's a closure without type annotations:
let baz = |qux| { qux + 1 };
// And a closure with type annotations:
let baz = |qux: int| -> int { qux + 1 };
// Which means that this would be a compile-time error:
let baz = |qux: int| -> int { qux + 1; }; // returning nil, but expected int
// But if you expect this to return int, then there are very few cases where
// this might not be a compile-time error:
let baz = |qux| { qux + 1; };
log(error, baz(bar)); // will print "()", but did you want bar + 1?
io::println(fmt!("%?", baz(bar))); // same as previous
io::println(fmt!("%d", baz(bar))); // this one *is* a compile-time error
// macros ftw
baz(bar); // this will also be a compile-time error
}
In Rust, nil is the name for the unit type (written as empty parentheses, like "()"), used whenever a function doesn't return anything meaningful. You're correct in that instead of null pointers it has the None variant of the Option enum.
Although everything you say is technically true, readers will tend to get the wrong idea unless we add that (1) most users of the do notation choose Haskell's significant white space rather than semicolons and (2) the (rare) Haskell code that does contain semicolons that behave the way you describe probably also contains semicolons (e.g., in Haskell's let statement and case statement) that have nothing to do with monads or the "sequencing" of side effects.
In other words, the semicolons of Haskell are only tenuously related to semicolons in languages like C.
That's not the point. Most programmers are not language lawyers. It is very bug-prone, which could in itself enough to remove the confusion from a language that advertises itself as "safe".
Bug prone? The compiler will inevitably warn you about your error at compile time. No such bug will have any consequence beyond that.
This is also not such a big problem when a human reads the function. We can always see at a glance the return type of the function. Either it's explicitly declared, or it's a lambda whose usage is readily visible. Semicolon or not, you can easily guess if the function is supposed to return its last expression, or not.
By the way, the compiler could do the same. Knowing that, there probably will be helpful error messages such as "did you forget the last semicolon?", or "should you remove the last semicolon?".
The semicolon is really just a small confirmation. That's why they didn't chose a heavier syntax.
Ok, I misunderstood then. I thought it would subtly return no value instead of a value, and you'd only discover this at run time. That would be bug prone. If it simply generates a compilation error, no problem.
The thing is that in a safe language, a value always evaluates to a value of the type of the expression[1]. E.g. in Haskell, if a function returns a list:
foo :: Int -> [a]
It should always evaluate to a list. There is no such thing as a null pointer. The only thing that comes close to a nullable type is an option type such as Maybe:
That's what makes it elegant: it lets you omit the return statement (safely) but otherwise looks exactly the same as C, even though it's semantically different.
Uh, it depends on taste. There are a few reasons to want it, such as lambdas as WalterBright mentioned, but I like it because when this is done in a language where statements and expressions are unified, ";" can neatly replace "," from C, and comma can be reused for something else. (Unifying statements and expressions has its own benefits, but we can start with avoiding GNU C's hideous "({ foo; bar; })", which evaluates to the value of bar...)
I memorably referred to Rust's significant semicolons as "the worst thing in the language" after my very first read through the tutorial, last November.
Almost a year later, I'm just as in love with Rust's semicolon rules as Armin is. However, I bet new users will still be just as instinctively revolted as I initially was.
I guess it depends on what languages you're experienced with. I just read through the Rust tutorial for the first time 2 days ago, and I fell in love with the semicolon rule at first sight.
The downside is that you would have to put () (Rust's version of “nil”) in a bunch of functions to fulfil the requirements of the callback's signature since otherwise the type inferred from the function would be the value of the last expression
Wouldn't co/contra-variance solve this entirely? It works just fine in Scala for example
Hello American. I'm from one of those other pesky countries that make up most of the human population of earth, and which usually have other keyboard layouts. I have to type shift+, to get the semicolon.
Indeed. On a Swedish keyboard the semicolon is not the biggest problem though. {}[] are since you need to press right-alt+number keys to get them. Really awful. It takes a very short time using an American keyboard for coding to realize its benefits.
Still, even on an American keyboard () are not terribly well placed.
I've been thinking about adopting the NEO2 layout(with some modification so that I get all of åäö) but haven't gotten around to it yet. Has anyone tried that for programming?
I have all my symbol characters easily accessible from the normal A-Z keys using a third shift state. It's awesome, and works with any keyboard layout (QWERTY, or even Swedish QWERTY are all fine, as is Dvorak, Colemak etc). Typing a previously uncomfortable sequence like "for (int i = 0; i < count; ++i)" is even pleasant now.
It takes about two weeks to get used to. If you try it, make sure the third shift state can be accessed from either hand (i.e. you have to have left and right keys, just like with shift or control).
I wasn't aware of NEO2 so I developed my own symbol layout using a genetic algorithm running over my code corpus, then adjusting to taste.
Another solution is xmodmap in Unix or some similar tool in Windows to change the keyboard layout. With that solution you don't have to miss the default national keyboard layout.
I've never seen QWERTZ keyboard in Poland and I live there. Typewriters indeed were using this layout, but not many people use them to program.
"practically all computers (except custom-made, e.g., in public sector and some Apple computers) use standard US layout (commonly called Polish programmers layout, in Polish: polski programisty) with Polish letters accessed through AltGr (AltGr-Z giving “Ż” and AltGr-X giving “Ź”)."
http://keyboard-layout.info/#Polish
Norwegian. Many european countries use the same placement for semicolon. All the Scandinavian and Baltic countries, Germany and Netherlands to name a few.
Russia uses shift+4. Can't imagine how annoying that is.
Russian keyboard layout does not have any Roman letters so apart from obscure even in their time, Soviet-devloped languages like http://en.wikipedia.org/wiki/Rapira, reaching for shift+4 for ';' is going to be the least of your concerns if you try to write code using Russian layout.
Nitpick: Almost. There is a Dutch keyboard layout, but nobody uses it. Nobody, as in, there's more people in the Netherlands that use Dvorak than people that use the Dutch keyboard layout.
The Dutch commonly use the US keyboard layout, which makes sense because Dutch and English have the same alphabet.
Is it just me or is this kind of ugly in the first place?
We're trying to mix logic that executes on each element in a sequence, with logic that controls how to iterate over that sequence. That is, a "return 42" statement will tell .each to stop iterating, but its contained within a block that is supposed to do things to individual elements.
The mathematical concept just doesn't sit right with me.. I guess if all you have as an iterator is 'each' then that would necessitate finding an additional way to modify iteration in some way, but still.. I don't like the fact that a return statement can break out of something outside of its own scope
I still don't understand why they couldn't just introduce a new keyword that has the same meaning as a missing semicolon. Call it 'ret' or something as a 'lesser return'. Relying on ppl to be semicolon hunters is poor for readability.
It also means you can't put more expressions on the same line because doing so requires a semi-colon which then eliminates the special semicolon behavior. Having an explicit 'ret' keyword means you could accomplish that and have more expressions for a separate block on the same line if desired.
No you wouldn't. See Javascript for all the pain that causes. Language syntax needs to be dictatorial so that the language is consistent for reading (which is more important than writing).
Technically, Python is a much better example of optional semicolons since you can use them and they're not required. Of course, the real reason they exist is for compound statements.
Untrue. While it's true that most people use the 'layout' rule, which is the significant whitespace, there is a non-whitespace-significant variant that uses braces and semicolons. You can see an example of how layout is expanded to the alternative syntax at http://www.haskell.org/onlinereport/lexemes.html#lexemes-lay...
Very few languages consider non-indentation whitespace as significant (not counting using whitespace as a token delimiter, which Haskell definitely does, e.g. "foo bar" =/= "foobar").
For Ruby at least "heavily" is a big exaggeration. It is significant as a token separator and in cases where inserting a line break makes the expression up to the line break a valid expression in itself, in which case it is treated as one.
You pretty much only need to remember to separate tokens where lack of whitespace might create a different vald token, and ensure any expressions that you want to let span more than one line ends with something that expects following tokens to make a complete expression.
Personally I can live with that easily. I can not live with significant indentation, on the other hand...
Yes, but many people assume whitespace everywhere will matter when they hear Python's offside rule described. It's a misconception I've found is worth dispelling, since almost no one minds meaningful indentation.
Normally in Rust you'd never use a trait like that without bringing it into the local namespace, so you wouldn't need to qualify it with its module like Armin does. And in future versions of Rust, Num will inherit from both the Eq and Copy traits, so the signature would instead look like this:
I haven't looked at Rust, but the semi colon looks like a decision to make some common verbose or "ugly" syntax to be less noisy. Admittedly most syntax has quirks, and this seems more like a quirk rather than an instance of "clever design". Ignoring the explanation of the differences between statements and expressions, the rest of the discussion is about the presence of a semicolon.
"But the alternative to semicolons is making line endings significant."
Only sorta - consider Haskell's indent rule: code which is part of an expression should be indented further than the start of that expression. Generally, this is something you should be doing to make your code readable regardless of the statement termination.
And with Rust's strong static typing, it's even easier to spot. While I love a preference for expressions, in CoffeeScript I've had to put a single null at the end of functions several times.
The presence/absence of the ; is both explicit and subtle enough to not annoy.
(As a prerequisite to making its point, the article teaches many intricate details regarding two dynamic languages which even most of their practitioners probably never think about, then dives into more intricate details regarding a language which most of us have probably never even used, let along grokked all the details of. I sense that there's something important here for me to learn, but from where I sit this article is a lot to bite off all at once.)
A tight summary by someone who understands all this would be appreciated by many readers, not just myself.
I don't think parent deserves to be modded down. The article did take a pretty long time to get to the point.
Why all this talk about Python and Ruby? Rust is not really competing with those languages-- it is quite specifically designed as a C++ replacement. They should be comparing themselves against C++, Golang, or D.
The Golang solution would just be a function which returns the next element or nil if there are no more elements. That function might be part of an interface, if you wanted to generalize it across several types. To me, this is a lot simpler than the other stuff that was discussed.
To go back to Rust specifically, rather than having magic semicolons, why not make the "break" statement take a value, which becomes the return value of the block? Despite all the tl;dr there was not much discussion of design alternatives.