Such a Little Thing: The Semicolon in Rust

qznc · on Oct 18, 2012

Is that really some "special behavior"? As far as i understand the semicolon is just an expression separator like in Erlang and Pascal. In contrast, the semicolon is a statement terminator in C. C uses the comma for expression separation. Practically everything is an expression in Rust, even blocks, so

  foo()       => evaluates to the return value of foo
  { foo() }   => evaluates to the return value of foo
  { foo(); }  => evaluates to nil

The last case is not a special case. The last expression in the block determines the returned value. There is an expression separator in there, so there must be two expressions which are separated by it. The first one is foo() and second one is ... wait for it ... the empty expression. Which value should EmptyExpression have? Of course: nil, which would be called void in C-land.

WalterBright · on Oct 18, 2012

There was some discussion about doing this in D, and there was some in C++11, too. I argued that the presence or absence of the ; did not visually stand out very well, and so would be a source of confusion and errors.

Hence, ; remained as a statement terminator, and the return keyword served to indicate returning an expression.

However, the D lambda syntax does not require a ; when the lambda body consists of only a single expression:

http://dlang.org/expression.html#Lambda

and in practice this has turned out to be well liked.

pcwalton · on Oct 18, 2012

Ignoring a potentially wanted return value with ";" is a hazard. I'd like to institute some set of "warn-unused-result" warnings in Rust to combat this.

pdw · on Oct 19, 2012

Wouldn't the type system complain that you're returning nil in a function that is not declared/inferred as returning nil?

kibwen · on Oct 19, 2012

This is correct for any "normal" (a.k.a. "item level") function, since their signatures must always be explicit. But you're allowed to infer the signatures of closures, which is the only place where this problem could arise.

  // This function lives at the top level of the module, so its types must be annotated
  fn foo(bar: int) -> int {
      // Here's a closure without type annotations:
      let baz = |qux| { qux + 1 };
      // And a closure with type annotations:
      let baz = |qux: int| -> int { qux + 1 };
      // Which means that this would be a compile-time error:
      let baz = |qux: int| -> int { qux + 1; };  // returning nil, but expected int
      // But if you expect this to return int, then there are very few cases where
      // this might not be a compile-time error:
      let baz = |qux| { qux + 1; };
      log(error, baz(bar));  // will print "()", but did you want bar + 1?
      io::println(fmt!("%?", baz(bar)));  // same as previous
      io::println(fmt!("%d", baz(bar)));  // this one *is* a compile-time error
                                          // macros ftw
      baz(bar);  // this will also be a compile-time error
  }

dagw · on Oct 19, 2012

What if foo() could return nil in some cases?

danieldk · on Oct 19, 2012

Then you need an algebraic data type having nil is one of its constructors, and that function returning that type.

AFAIK, Rust doesn't have nil. So, the closest thing is using the option type.

kibwen · on Oct 19, 2012

In Rust, nil is the name for the unit type (written as empty parentheses, like "()"), used whenever a function doesn't return anything meaningful. You're correct in that instead of null pointers it has the None variant of the Option enum.

qznc · on Oct 19, 2012

An understandable point of view.

Though I can think of an even more crazy variation. Declare the semicolon a binary operator and allow overloading it.

As funny as that sounds, Haskell provides something like this, as do-notation can behave differently depending on the monad it is in.

ajuc · on Oct 19, 2012

Your post made me check it, and you CAN overload "operator," in C++ apparently :)

http://en.wikibooks.org/wiki/C%2B%2B_Programming/Operators/O...

My mind is blown. You can write bottom-up code in C++ :)

EDIT: no, you still can't the expressions are evaluated first, before calling "operator,"

hollerith · on Oct 19, 2012

Although everything you say is technically true, readers will tend to get the wrong idea unless we add that (1) most users of the do notation choose Haskell's significant white space rather than semicolons and (2) the (rare) Haskell code that does contain semicolons that behave the way you describe probably also contains semicolons (e.g., in Haskell's let statement and case statement) that have nothing to do with monads or the "sequencing" of side effects.

In other words, the semicolons of Haskell are only tenuously related to semicolons in languages like C.

sp332 · on Oct 18, 2012

It's odd because people think of semicolons as the end of a statement, not a delimiter between statements.

adestefan · on Oct 18, 2012

Which is usually wrong.

wladimir · on Oct 19, 2012

That's not the point. Most programmers are not language lawyers. It is very bug-prone, which could in itself enough to remove the confusion from a language that advertises itself as "safe".

loup-vaillant · on Oct 19, 2012

Bug prone? The compiler will inevitably warn you about your error at compile time. No such bug will have any consequence beyond that.

This is also not such a big problem when a human reads the function. We can always see at a glance the return type of the function. Either it's explicitly declared, or it's a lambda whose usage is readily visible. Semicolon or not, you can easily guess if the function is supposed to return its last expression, or not.

By the way, the compiler could do the same. Knowing that, there probably will be helpful error messages such as "did you forget the last semicolon?", or "should you remove the last semicolon?".

The semicolon is really just a small confirmation. That's why they didn't chose a heavier syntax.

wladimir · on Oct 19, 2012

Ok, I misunderstood then. I thought it would subtly return no value instead of a value, and you'd only discover this at run time. That would be bug prone. If it simply generates a compilation error, no problem.

danieldk · on Oct 19, 2012

The thing is that in a safe language, a value always evaluates to a value of the type of the expression[1]. E.g. in Haskell, if a function returns a list:

  foo :: Int -> [a]

It should always evaluate to a list. There is no such thing as a null pointer. The only thing that comes close to a nullable type is an option type such as Maybe:

  data Maybe a = Just a | Nothing

The function

  bar :: Int -> Maybe [a]

can either evaluate to a Just [a] or Nothing.

[1] Actually, I lied a bit, since there is bottom: http://www.haskell.org/haskellwiki/Bottom But bottom does not fullfil the same role as, say null pointers.

comex · on Oct 18, 2012

That's what makes it elegant: it lets you omit the return statement (safely) but otherwise looks exactly the same as C, even though it's semantically different.

eternalban · on Oct 18, 2012

parsimony =/= elegance

comex · on Oct 18, 2012

Uh, it depends on taste. There are a few reasons to want it, such as lambdas as WalterBright mentioned, but I like it because when this is done in a language where statements and expressions are unified, ";" can neatly replace "," from C, and comma can be reused for something else. (Unifying statements and expressions has its own benefits, but we can start with avoiding GNU C's hideous "({ foo; bar; })", which evaluates to the value of bar...)

kibwen · on Oct 18, 2012

I memorably referred to Rust's significant semicolons as "the worst thing in the language" after my very first read through the tutorial, last November.

Almost a year later, I'm just as in love with Rust's semicolon rules as Armin is. However, I bet new users will still be just as instinctively revolted as I initially was.

lilyball · on Oct 18, 2012

I guess it depends on what languages you're experienced with. I just read through the Rust tutorial for the first time 2 days ago, and I fell in love with the semicolon rule at first sight.

lihaoyi · on Oct 18, 2012

    The downside is that you would have to put () (Rust's version of “nil”) in a bunch of functions to fulfil the requirements of the callback's signature since otherwise the type inferred from the function would be the value of the last expression

Wouldn't co/contra-variance solve this entirely? It works just fine in Scala for example

    scala> def runTwice(f: () => Unit) = {
         |  f()
         |  f()
         | }
    runTwice: (f: () => Unit)Unit

    scala> runTwice{ () =>
         |  println("moo")
         |  1
         | }
    moo
    moo

Note how it expects a function that returns Unit, i'm passing in a function that returns an Int (1), but the compiler is perfectly happy.

pcwalton · on Oct 18, 2012

I implemented basically this in Rust, and it was overwhelmingly rejected by the community at the time. People seem to like the strict typechecking.

MartinCron · on Oct 18, 2012

Are semicolons annoying to type? Probably, I got used to them

The semicolon is the least-annoying non-letter character to type. It's right there on your home row.

glennsl · on Oct 18, 2012

Hello American. I'm from one of those other pesky countries that make up most of the human population of earth, and which usually have other keyboard layouts. I have to type shift+, to get the semicolon.

lucian1900 · on Oct 18, 2012

A potential workaround is to always use an American keyboard. It's what I do.

symmetricsaurus · on Oct 19, 2012

Indeed. On a Swedish keyboard the semicolon is not the biggest problem though. {}[] are since you need to press right-alt+number keys to get them. Really awful. It takes a very short time using an American keyboard for coding to realize its benefits.

Still, even on an American keyboard () are not terribly well placed.

I've been thinking about adopting the NEO2 layout(with some modification so that I get all of åäö) but haven't gotten around to it yet. Has anyone tried that for programming?

tomlu · on Oct 19, 2012

Hej!

I have all my symbol characters easily accessible from the normal A-Z keys using a third shift state. It's awesome, and works with any keyboard layout (QWERTY, or even Swedish QWERTY are all fine, as is Dvorak, Colemak etc). Typing a previously uncomfortable sequence like "for (int i = 0; i < count; ++i)" is even pleasant now.

It takes about two weeks to get used to. If you try it, make sure the third shift state can be accessed from either hand (i.e. you have to have left and right keys, just like with shift or control).

I wasn't aware of NEO2 so I developed my own symbol layout using a genetic algorithm running over my code corpus, then adjusting to taste.

bitcracker · on Oct 19, 2012

Another solution is xmodmap in Unix or some similar tool in Windows to change the keyboard layout. With that solution you don't have to miss the default national keyboard layout.

MartinCron · on Oct 19, 2012

I feel your pain, I have to jump through all kinds of høøps to get some characters on my keyboård.

bla2 · on Oct 19, 2012

I switch from my native layout to the US layout for coding.

georgy_the_dev · on Oct 18, 2012

Which keyboard layout is that?

DanBC · on Oct 18, 2012

Czech; German; Polish; French; Spanish; Italian; Portuguese; etc etc etc.

adamnemecek · on Oct 18, 2012

That being said, there are also programmer's layouts specific to single languages which among others have a semi-colon on the home row.

naitbit · on Oct 19, 2012

I've never seen QWERTZ keyboard in Poland and I live there. Typewriters indeed were using this layout, but not many people use them to program.

"practically all computers (except custom-made, e.g., in public sector and some Apple computers) use standard US layout (commonly called Polish programmers layout, in Polish: polski programisty) with Polish letters accessed through AltGr (AltGr-Z giving “Ż” and AltGr-X giving “Ź”)." http://keyboard-layout.info/#Polish

emillon · on Oct 19, 2012

There's no need to use shift to get a semicolon on a french keyboard.

glennsl · on Oct 18, 2012

Norwegian. Many european countries use the same placement for semicolon. All the Scandinavian and Baltic countries, Germany and Netherlands to name a few.

Russia uses shift+4. Can't imagine how annoying that is.

pandaman · on Oct 19, 2012

Russian keyboard layout does not have any Roman letters so apart from obscure even in their time, Soviet-devloped languages like http://en.wikipedia.org/wiki/Rapira, reaching for shift+4 for ';' is going to be the least of your concerns if you try to write code using Russian layout.

skrebbel · on Oct 20, 2012

Nitpick: Almost. There is a Dutch keyboard layout, but nobody uses it. Nobody, as in, there's more people in the Netherlands that use Dvorak than people that use the Dutch keyboard layout.

The Dutch commonly use the US keyboard layout, which makes sense because Dutch and English have the same alphabet.

bane · on Oct 19, 2012

here's a bunch (scroll down)

http://keyboard-layout.info/

dbaupp · on Oct 18, 2012

Swedish is one such keyboard.

akkartik · on Oct 18, 2012

I am going to pay more attention to http://en.wikipedia.org/wiki/Keyboard_layout after this thread!

ianstallings · on Oct 18, 2012

That sounds really tough. An extra key press. I bet that wastes like, 10 seconds a day.

plaguuuuuu · on Oct 19, 2012

Is it just me or is this kind of ugly in the first place?

We're trying to mix logic that executes on each element in a sequence, with logic that controls how to iterate over that sequence. That is, a "return 42" statement will tell .each to stop iterating, but its contained within a block that is supposed to do things to individual elements.

The mathematical concept just doesn't sit right with me.. I guess if all you have as an iterator is 'each' then that would necessitate finding an additional way to modify iteration in some way, but still.. I don't like the fact that a return statement can break out of something outside of its own scope

edit: even the low-level alternative seems nicer

    for(blah;blah;blah) { 
        stuff;
    }

msluyter · on Oct 19, 2012

I'd just like to express my appreciation for the depth of this article. I always learn something from Armin Ronacher.

windle · on Oct 19, 2012

I still don't understand why they couldn't just introduce a new keyword that has the same meaning as a missing semicolon. Call it 'ret' or something as a 'lesser return'. Relying on ppl to be semicolon hunters is poor for readability.

It also means you can't put more expressions on the same line because doing so requires a semi-colon which then eliminates the special semicolon behavior. Having an explicit 'ret' keyword means you could accomplish that and have more expressions for a separate block on the same line if desired.

tome · on Oct 18, 2012

Take a look at Haskell for a very clean and expressive syntax without (required) semicolons.

Dobbs · on Oct 18, 2012

I'd rather have significant semi-colons than significant whitespace[1]. Don't mix presentation and semantics.

1: http://c2.com/cgi/wiki?SyntacticallySignificantWhitespaceCon...

orlandu63 · on Oct 19, 2012

I'd rather have the ability to choose between and freely mix significant whitespace and significant semi-colons within a single source file.

lmm · on Oct 19, 2012

No you wouldn't. See Javascript for all the pain that causes. Language syntax needs to be dictatorial so that the language is consistent for reading (which is more important than writing).

joeyespo · on Oct 21, 2012

JavaScript's problem isn't that you can choose whether or not to use semicolons, it's that they're automatically put in for you if you omit them.

http://lucumr.pocoo.org/2011/2/6/automatic-semicolon-inserti...

Technically, Python is a much better example of optional semicolons since you can use them and they're not required. Of course, the real reason they exist is for compound statements.

http://stackoverflow.com/questions/8236380/why-is-semicolon-...

lucian1900 · on Oct 18, 2012

Just like Python, Haskell only ever has significant indentation.

lilyball · on Oct 18, 2012

Untrue. While it's true that most people use the 'layout' rule, which is the significant whitespace, there is a non-whitespace-significant variant that uses braces and semicolons. You can see an example of how layout is expanded to the alternative syntax at http://www.haskell.org/onlinereport/lexemes.html#lexemes-lay...

lucian1900 · on Oct 19, 2012

No, I meant that Haskell doesn't consider any whitespace that isn't indentation as significant.

lilyball · on Oct 19, 2012

Very few languages consider non-indentation whitespace as significant (not counting using whitespace as a token delimiter, which Haskell definitely does, e.g. "foo bar" =/= "foobar").

the_mitsuhiko · on Oct 19, 2012

Ruby and CoffeeScript do and their syntax relies heavily on it.

vidarh · on Oct 19, 2012

For Ruby at least "heavily" is a big exaggeration. It is significant as a token separator and in cases where inserting a line break makes the expression up to the line break a valid expression in itself, in which case it is treated as one.

You pretty much only need to remember to separate tokens where lack of whitespace might create a different vald token, and ensure any expressions that you want to let span more than one line ends with something that expects following tokens to make a complete expression.

Personally I can live with that easily. I can not live with significant indentation, on the other hand...

the_mitsuhiko · on Oct 19, 2012

`foo()` and `foo ()` are different things in ruby, so are `foo[x]` and `foo [x]`. That is significant whitespace.

lucian1900 · on Oct 19, 2012

Yes, but many people assume whitespace everywhere will matter when they hear Python's offside rule described. It's a misconception I've found is worth dispelling, since almost no one minds meaningful indentation.

enjolras · on Oct 19, 2012

Actually, that's a lot of words to say that ; is a binary operator, if i'm correct.

a ; b is the operator which returns the value of b. And, there is some syntactic sugar to make a ; equivalent to a ; nil which returns nil.

yxhuvud · on Oct 19, 2012

I'm all with the author with the Ruby love and it looks like Rust managed to implement them in a cleaner syntactic way compared to Ruby.

However, that template code was utterly hideous.

kibwen · on Oct 19, 2012

Normally in Rust you'd never use a trait like that without bringing it into the local namespace, so you wouldn't need to qualify it with its module like Armin does. And in future versions of Rust, Num will inherit from both the Eq and Copy traits, so the signature would instead look like this:

  fn find_even<T: Num>(vec: &[T]) -> Option<T> {

daakus · on Oct 19, 2012

I haven't looked at Rust, but the semi colon looks like a decision to make some common verbose or "ugly" syntax to be less noisy. Admittedly most syntax has quirks, and this seems more like a quirk rather than an instance of "clever design". Ignoring the explanation of the differences between statements and expressions, the rest of the discussion is about the presence of a semicolon.

dllthomas · on Oct 19, 2012

"But the alternative to semicolons is making line endings significant."

Only sorta - consider Haskell's indent rule: code which is part of an expression should be indented further than the start of that expression. Generally, this is something you should be doing to make your code readable regardless of the statement termination.

lucian1900 · on Oct 19, 2012

And with Rust's strong static typing, it's even easier to spot. While I love a preference for expressions, in CoffeeScript I've had to put a single null at the end of functions several times.

The presence/absence of the ; is both explicit and subtle enough to not annoy.

peripetylabs · on Oct 19, 2012

As an aside, the "power_it" function in Python can also be written as:

    map(lambda t: t ** 2, [1,2,3,4])

or even:

    [(lambda t: t**2)(x) for x in [1,2,3,4]]

irahul · on Oct 19, 2012

> [(lambda t: t2)(x) for x in [1,2,3,4]]

Why would you have that lambda?

    [x**2 for x in range(1, 5)]

Scramblejams · on Oct 19, 2012

My kingdom for a TL;DR!

(As a prerequisite to making its point, the article teaches many intricate details regarding two dynamic languages which even most of their practitioners probably never think about, then dives into more intricate details regarding a language which most of us have probably never even used, let along grokked all the details of. I sense that there's something important here for me to learn, but from where I sit this article is a lot to bite off all at once.)

A tight summary by someone who understands all this would be appreciated by many readers, not just myself.

irahul · on Oct 19, 2012

> A tight summary by someone who understands all this would be appreciated by many readers, not just myself.

A tight summary of all of it would be as long as the article. However, the point about semicolons in Rust is:

1. ; is a separator, not a terminator.

2. a;b separates a and b.

3. a; is a special case which means a;nil(or whatever is the equivalent in Rust)

4. The last expression in a function will be the return value of the function.

5. If the last line in a function is "a", it returns a. If it's "a;", it returns nil(from 3)

Scramblejams · on Oct 19, 2012

Perfect, thank you.

cmccabe · on Oct 19, 2012

I don't think parent deserves to be modded down. The article did take a pretty long time to get to the point.

Why all this talk about Python and Ruby? Rust is not really competing with those languages-- it is quite specifically designed as a C++ replacement. They should be comparing themselves against C++, Golang, or D.

The Golang solution would just be a function which returns the next element or nil if there are no more elements. That function might be part of an interface, if you wanted to generalize it across several types. To me, this is a lot simpler than the other stuff that was discussed.

To go back to Rust specifically, rather than having magic semicolons, why not make the "break" statement take a value, which becomes the return value of the block? Despite all the tl;dr there was not much discussion of design alternatives.

comex · on Oct 18, 2012

Neat. I came up with this rule some time ago for my as-yet-unimplemented pet language, but I didn't realize Rust did the same thing.

drivebyacct2 · on Oct 18, 2012

Some of this seems cool and some of it is soaring over my head. I'm kind of embarassed and newly motivated to learn Rust.