The Path to Rust

pimeys · on May 26, 2016

In my current job, I was given a task to write a service which should handle billions of events eventually in the fastest possible way. The language choice was given for me to decide, and I was thinking that maybe I'll do it with C++14, Scala, Go or Rust. C++ has it's quirks and I'm not really enjoying it's build tool of choice, cmake. Scala I can write fast, but scaling the app in Mesos would consume lots of memory; every task would take a huge slice of RAM because of the additional JVM. Go I just don't like as a language (personal taste) and I think the GC adds a bit too much of pausing for this app, so I gave Rust a shot.

The first week was madness. I'm fairly experienced developer and the Rust compiler hit me to my fingers constantly. Usually the only way out was to write the whole part again with a different architecture. On the second week I was done with my tasks, very comfortable with the language and just enjoying the tooling. Also I was relying on tests way less because of the compiler, even less than with Scala. If it compiles, it has a big chance of working. Cargo is awesome. Let me repeat: Cargo is awesome. Also I like how I can write code with Vim again and even though for some things I need to read the Rust source, it is pretty easy to get answers if you're in trouble. In the end I wrote some integration tests with Python and I'm quite happy with them.

Now I want to write more Rust.

karterk · on May 26, 2016

On the other hand, as your system gets more complex, you will start missing tried and test libraries from some of the more mature languages you have mentioned. Maybe the extra time you have to spend writing that code will compensate for the extra memory you would have spent because of the JVM. Not taking anything away from Rust - I'm just giving an alternative perspective.

pimeys · on May 26, 2016

That is a risk we've analyzed and are willing to take. Scala can stay on the data side of our platform, but the real time part will be either Rust or C++14 (and Python where we don't need raw speed and can handle everything with one thread). For the libraries the situation is not that bad. Something I was missing from Scala side was Kamon. It is a really awesome library I've been using (and contributing) for some time. For Rust I needed to do my own solution with threads and a bit of macro love. It works, it is using a constant amount of ram and when we get our next Rust project running, I might open source that part for everybody to use...

annnnd · on May 26, 2016

As far as I can tell, Kamon is an application monitoring system - what do you monitor? I am curious as I have never felt the need to monitor apps this way (but I have never worked on HFT apps obviously).

pimeys · on May 26, 2016

Also, one thing I would love to see is the syntactic sugar for binding, like Haskell's `do` and Scala's `for`. Would be super useful for lots of things, e.g. validating requests in a web server.

pimeys · on May 26, 2016

Counters, times and such. These are to be sent to a service like InfluxDB and monitored with a tool such as Grafana.

loup-vaillant · on May 26, 2016

I bet a fair number of those tried and tested libraries expose a C interface. As far as I know, Rust have an FFI. Those libraries are only a set of bindings away. So I think we can do away with Java libraries, as well as the occasional header-only C++ template madness¹.

1: I mean, it's madness if you want to talk to it from the outside. If you stick with C++, this is "merely" flamewar material (take the Boost devs and the C++ committee on one side, and game devs like Mike Acton, Jonathan Blow, or Casey Muratori on the other side).

lmm · on May 26, 2016

> I bet a fair number of those tried and tested libraries expose a C interface.

Some of them do, or have C equivalents, but many of them don't. Once you get into "there's only one library that does it and it's one guy's graduate thesis" territory, often the only option is on the JVM.

loup-vaillant · on May 26, 2016

> "there's only one library that does it and it's one guy's graduate thesis"

Those libraries aren't tried and tested. They can be reimplemented. And in some contexts, they shouldn't be trusted as a black box.

jleahy · on May 26, 2016

The last time I was in that situation, the only option was lisp. I'm still supporting a lisp environment for that one piece of code.

Varriount · on May 26, 2016

Have you looked at languages like Nim [1]? Garbage collectors don't have to be of the stop-the-world variety.

[1] http://nim-lang.org/ http://nim-lang.org/docs/gc.html

TD-Linux · on June 1, 2016

That does look like a stop-the-world garbage collector, just with a tunable maximum runtime.

gizzlon · on May 26, 2016

> Go I just don't like as a language (personal taste) and I think the GC adds a bit too much of pausing for this app

Not saying you have to like Go, we all have different preferences. But FYI, the gc has gotten much better in the latest releases. I don't think you would be bothered by gc pauses. (my understanding is that it' both faster and more "spread out")

jeffdavis · on May 26, 2016

Personally, I don't care about the pauses, I care about having a complex runtime.

Consider a widely deployed library written in C -- say, zlib or libjpeg or SQLite or openssl. Could you rewrite those in Go or Haskell or Scala? No. Because nobody wants a big surprise when linking a harmless C library all of a sudden brings in an entire GC that's starting extra threads, etc.

In other words, Rust is the first practical language in a long time that could be used for a new library that might be used in millions of different applications written in dozens of languages.

pimeys · on May 26, 2016

We're doing real time stuff where these pauses matter. Just that we don't really need GC for anything and adding it to certain services cause more harm than it causes good.

pimeys · on May 26, 2016

If the execution is measured in nanoseconds, the GC pauses are measured in millisecond, sometimes tens or hundreds...

Also RAM has a cost, for garbage collected service you need to reserve some extra RAM. Scaling these processes add an extra memory overhead we're not so eager to take.

gizzlon · on May 26, 2016

Oki, got it. Can't decide if that sounds like horrible conditions or a fun challenge =)

For reference, the Go 1.5 GC propaganda:

> Go 1.5, the first glimpse of this future, achieves GC latencies well below the 10 millisecond goal we set a year ago.

Blog post: https://blog.golang.org/go15gc

Slide: https://talks.golang.org/2015/go-gc.pdf

Talk: https://www.youtube.com/watch?v=aiv1JOfMjm0

Edit: Meanwhile, over in Java land they measure stop the world pauses in seconds, not milliseconds:

http://stackoverflow.com/questions/15696585/long-gc-pauses-i...

https://blogs.oracle.com/poonam/entry/troubleshooting_long_g...

vvanders · on May 26, 2016

10ms is also 2/3rds of a frame @ 60FPS which is going to guarantee dropped frames if you're doing anything soft real-time.

Also most GC'd languages don't give you strong control over data locality(by nature of everything being a reference) so you pay in cache misses which are not cheap.

jerf · on May 26, 2016

Go gives you decent control over layout, much better than most scripting languages. It's not quite as good as C, mostly due to not having a wide variety of very complicated data structures to choose from; you've got structs, arrays, and maps. But the first two in particular let you do many things to control locality.

It's also worth pointing out 10ms is the maximum, not the minimum. There's plenty of workloads where you're not going to see pauses anywhere near that long. It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability, especially with the amount of stuff nowadays that actually runs on the GPU. You're not going to be able to do this for your next AAA game, but I wouldn't see a huge reason that an "indie" game couldn't use Go for that. (Probably not my first choice, all else being equal with library support personally I'd suggest Rust for this for a lot of other reasons (the type of concurrency you often get in games is going to be well supported with Rust's borrow checker architecture), but contrary to some discussion I would say it is still on the table.)

jleahy · on May 26, 2016

It's not worth it, because as soon as you go down this route you're going to be constantly thinking about the GC.

If I add a cyclic reference here will that make GCs longer? Maybe I should have a free list and reuse these objects (after all they're responsible for most of my allocations)?

As soon as you're thinking like that as you write each line of code you've lost all of the benefits of the language being high-level, and you'd be better of controlling memory manually.

jdright · on May 27, 2016

> It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability, especially with the amount of stuff nowadays that actually runs on the GPU.

Maybe not impossible, but highly highly improbable. If you say that then you have no idea what it takes to run 60fps constantly. It is HARD, VERY HARD.

AnimalMuppet · on May 26, 2016

> It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability...

But it's hard to write a 3D game in Go and get 60 fps with certainty.

jerf · on May 26, 2016

But so what? It's impossible to write a 3D game in anything and get 60 fps with certainty. If the McAfee Disk Bandwidth Consumer decides to kick in, you may take a penalty on disk access. If the system specs are slightly lower than you planned for, you don't get your guaranteed 60 fps. If the GPU overheats and underclocks, you don't get 60fps.

It's not that hard to write something in Go where you pay two milliseconds every 5 minutes or something, or less than that. Again, let me reiterate, 10ms per sweep is the max, not the min. Plus it sounds like people think that Go somehow guarantees that you're going to pay this on every frame or something, rather than potentially separated by seconds or minutes.

As I said, Go probably isn't my first choice for a game anyhow, but people are grossly overstating how bad it is. Games may be high performance, but they also do things like run huge swathes of the game on a relatively slow scripting language like Lua or some Lisp variant or something. It's not like the network servers that are Go's core design motivation are completely insensitive to performance or latency either.

(Plus to be honest I reject the premise that all games must be 60 fps on the grounds that they already aren't. I already disclaimed AAA vs. Indie. But it's still not a bad discussion.)

zzzcpan · on May 27, 2016

GC pauses are not the real reason people hate GCs even when they say otherwise. It's mostly about the cognitive overhead GC imposes on you and compromises that it forces you to take. If it's there - you are always aware of it, you cannot ignore it, you know it's unpredictable, but you know there are better predictable choices and it makes it very hard to feel good about the quality of the software you write. It's like it forces you to accept mediocrity.

vvanders · on May 27, 2016

Lua, slow? I don't even know where to begin to start on that one.

Like I said in another part of the thread, different tools for different domains. You certainly can get a consistent 60 FPS on any consoles, which are also much less forgiving about techniques that would be perfectly fine on a PC.

shinymark · on May 26, 2016

For another comparison 10 ms is nearly your entire frame at 90hz for current VR headsets.

nindalf · on May 26, 2016

I encourage you to read the links on the comment you responded to. If you had, you would have found that as of Go 1.5 the pauses are 2ms or less for a heapsize < 1GB. In Go 1.6 this reduced further. The 10ms pause that you're thinking of it the upper limit, not what actually happens even at heapsizes of 100GB+

vvanders · on May 26, 2016

2ms is still more than the entire budget we'd dedicate to animation on a lot of scenes. You want predictable performance in these cases and tracing GCs are fundamentally opposed to this.

Generally you're much better off using a scripting language like Lua for the places where you want the advantages GC brings but scoping it so you can limit the damage it does to your frame time.

nickpsecurity · on May 26, 2016

What about microseconds using the category of collectors designed for your application area (real-time or minimal delay)?

http://www.cs.technion.ac.il/~erez/Papers/real-time-pldi.pdf

vvanders · on May 26, 2016

CAS on every property write? That adds up really quickly which is why Rust has Rc vs Arc.

The issue with GC is it's not deterministic. There's a whole nother aspect that has to do with free heap space. Quite a few GCs need some multiple of working set free to do the compact phase. If they don't have it then GC times start to spiral out of control.

On an embedded device(or modern game console) you'll be lucky to have 5-10mb free. On PSP we used to have only 8mb total since 24/32mb went to video+audio. We still ran Lua because we could constrain it to a 400kb block and we knew it would never outgrow it.

Just like everything in software there's different tools for different domain spaces. Otherwise someone would just write one piece of software that fits every problem and we'd all be out of a job.

nickpsecurity · on May 26, 2016

I don't have the working set numbers on that example. I just had latency, which you were discussing, which maxed out a 145 micro-seconds per pause on highest-stress test. Usually lower. Far as working set, there's a whole subfield dedicated to embedded systems. One early one from IBM I found benchmarks for on a microcontroller had 85% peak performance with 5-30% working set for GC. They basically trade in one direction or another.

The more common strategy I see in CompSci for your use-case is to have a mix of memory pools, GC, and safe manual. Ada had memory pools and I'm sure you get the concept. Safe manual is when static analysis shows a delete can happen without safety consequences. So, that can be unmanaged. Then, what's left is handled by the concurrent, real-time GC. In typical applications, that's a small fraction of memory.

vvanders · on May 26, 2016

Yup and in games that exactly the space that Lua/UnrealScript/etc fit neatly into.

The issue is with using GC based language for areas where you need high throughput and low latency(there's the whole cache miss thing which GC exacerbates).

cmrdporcupine · on May 26, 2016

10ms latency is crap if you're doing something like realtime ad exchange bidding, where you have <60ms bid times. Really do not want that hiccup.

gizzlon · on May 26, 2016

Curious, how long can a pause be before it's a problem? (I do not write these kind of services)

jerf · on May 26, 2016

"It depends". Some services are insensitive to pauses of several minutes (email processing; you may not want it there all the time but spikes to that level are often acceptable). Some services are sensitive to pauses in the single-digit milliseconds (for instance trying to maintain a high frame rate).

dllthomas · on May 26, 2016

Two years ago I was working on a system that cared about single digit microseconds, luckily with low throughput.

jleahy · on May 26, 2016

Lots of people have to worry about single digit microseconds with high throughput.

dllthomas · on May 27, 2016

I don't think I implied otherwise.

falcolas · on May 26, 2016

How do you handle the reference counting pauses in Rust then; rely on them being deterministic? Or do you completely avoid the reference counting boxes?

anp · on May 26, 2016

How does reference counting pause? There's no stop-the-world GC action, as the garbage collection is amortized over every object deallocating itself.

cmrdporcupine · on May 26, 2016

That's not entirely true, if your decrement results in a cascading free of an entire tree of objects you will pay for the deallocation time for that entire tree -- decrement, free, decrement children, free, etc..

And unless your RC algorithm is more clever than most that's going to be stop-the-world.

TheHydroImpulse · on May 26, 2016

Rc is scoped to a single thread, so it'll be, at worst, stop-the-thread.

kibwen · on May 26, 2016

Not the OP, but in my experience it's fairly rare to encounter someone using the `Rc` type in Rust. It's nowhere near as prevalent as `shared_ptr` seems to be in C++, for example.

pimeys · on May 26, 2016

There is also the `Arc` type, which is an atomic reference counter. You still need those, especially if you need to share stuff between multiple threads.

Jweb_Guru · on May 27, 2016

You can often get away with using scoped threads.

pimeys · on May 26, 2016

This is a good article about different wrapper types in Rust http://manishearth.github.io/blog/2015/05/27/wrapper-types-i...

vvanders · on May 26, 2016

Yeah, a lot of that has to do with the borrow checker guiding you towards single-ownership designs with is a good thing(tm).

Matthias247 · on May 26, 2016

I played a little bit around with Go currently, and thereby ported a part of an older application for which I already have C++, C#, Java and other implementations and benchmarks to compare them.

My current results is that Go performance can be really good (C++ level) if I try to avoid allocations as much as possible. With a more sloppy implementation that e.g. did not reuse buffer objects ([]byte) between multiple network packets the performance dropped significantly to about 1/3 of the C++ implementation, and the GC time dominated.

Fortunatly it's quite easy to avoid garbage in Go, as we have value types there and the synchronous programming style means lots of stuff can be put on the stack and there's not so much gargabe due to captured objects in closures and similar stuff. All in all I'm quite confident that with a mediocre amount of optimization Go can be suitable for lots of application types. Although I would not necessarily try to use it for low-latency audio processing if something else (that produces garbage) is running in the same process.

dorfsmay · on May 27, 2016

If you're going to work at that low a level, what are the advantages of golang over rust?

lmm · on May 26, 2016

Is the JVM overhead really significant these days? You're talking about wasting what, maybe 32MB for each instance? (which can have multiple threads, you don't need one per core). Is your RAM really so limited that you'd run out of RAM before maxing out all your CPUs or I/O in a Scala version?

(I am kind of interested in Rust, but not going to touch it until it gets proper HKT support - you have to duplicate so much code otherwise)

pimeys · on May 26, 2016

It is a lot, when we need to scale by adding more Mesos tasks. Every task launches a new JVM instance, which adds to the total memory usage. Also by using a GC and not wanting to collect all the time, you need to add more memory overhead, which is not required with Rust.

Comparison: I have one app written with Scala and consuming 350 megabytes per Mesos task. The similar-sized Rust app is using 8-10 megabytes of RAM. When you have 1000 instances running during a peak, this escalates fast.

lmm · on May 27, 2016

A bit odd to be running a VM on a VM. Did you consider using a Java "application server" or similar that can keep a VM running and run tasks within that?

reality_hacker · on May 26, 2016

So, for jvm you create less tasks but with more ram, e.g. 100 tasks 4G RAM each, and problem is solved?

pimeys · on May 27, 2016

Having more of the smaller tasks spreads out the load more evenly and when rust apps starts in less than a second, it's easy to fine tune and reason about having more tasks running. With JVM, we just waste memory because of the overhead from GC, and I don't feel comfortable with the idea.

fn1 · on June 1, 2016

Why would you need to start up an additional JVM for every task. Can't one JVM run multiple tasks?

gdw2 · on June 1, 2016

The mention of Mesos may imply that each instance would be a separate container.

webscalist · on May 26, 2016

link to source code? want to see what 2-weeks old rust programmer writes and how feasible it is to train our team

pimeys · on May 26, 2016

Nothing open source yet. I might take the metrics system out and publish it on Github when I've used it in another service and it is a bit more generalized.

loup-vaillant · on May 26, 2016

> Rust is not the most beginner-friendly language out there — the compiler is not as lenient and forgiving as that of most other languages […], and will regularly reject your code […]. This creates a relatively high barrier to entry […]. In particular, Rust’s “catch bugs at compile time” mentality means that you often do not see partial progress — either your program doesn’t compile, or it runs and does the right thing. […] it can make it harder to learn by doing than in other, less strict languages.

I don't see how making the type system stricter makes the language harder to learn. Maybe that's because I know another relatively paranoid type system (Ocaml), but still.

A type system that rejects your code is like a teacher looking at a proof you just wrote, and tells you "this doesn't even make sense, and here's why". It may be frustrating, but this kind of feedback loop is tighter than what you would get from a REPL.

And you do see partial progress: the type errors change and occur further in the source code as you correct your program. Each error is an opportunity to fix a typo or a misconception. The distinction between a broken prototype that doesn't even compile and a working program isn't binary: when you correct a type error, your program is less broken, even though it doesn't compile yet.

pYQAJ6Zm · on May 26, 2016

> A type system that rejects your code is like a teacher looking at a proof you just wrote, and tells you "this doesn't even make sense, and here's why".

Not only that: the Rust compiler, for many cases, will suggest calling it with a ‘--explain’ argument to get an explanation of what’s causing the error and, usually, how to solve it. A very friendly feature.

fizzbatter · on May 26, 2016

I'm not trying to bash Rust, i love it, so for random viewers looking to pick a fight.. please don't bother. I'm only citing what my experience thus far is.

It's confusing, as hell.

Let me try to explain a little bit. I'm a average dev, and have been for ~7 years or so. My languages have typically been dynamic, but the past 2 years has been heavily Go.

I picked up the majority of Rust, as well as basic borrow checker usage, in a matter of hours. But that's where the fun stopped. Generic syntax beat me up (mostly related to Static dispatch and Type Assertions). Lifetime syntaxes are currently kicking my ass as well. The problem as i see it, seems to be that i am a more.. "dig in" developer. And frankly, that does not seem to fly here. I've read a lot of the rust docs, but rather than start to finish, it's been in response to questions i have. This seems to be problematic for rust, as important concepts can be left in the dust.

A good example of this is Static vs Dynamic Dispatch is rust. Heavy trait usage combined with using Static Dispatch incorrectly resulted in code not designed for what i wanted, and lots of seemingly random bugs. Ie, `foo = Foo::new()` would fail, but `foo = Foo::new(); foo.bad(baz);` would succeed.

Currently i'm trying to figure out how to either move a variable out of a block, or, more importantly define a lifetime on `Ok(r) => r,` so that the result can escape the block.

Anyway. None of this is the fault of rust exactly, and again, i love this language. I'm switching because Go hasn't offered the full safety i want. But man, it has been quite the hellish experience to get started.

steveklabnik · on May 26, 2016

Have you dropped by IRC? Always happy to help answer questions in real-time.

  > how to to either move a variable out of a block,

Blocks evaluate to a value, so if it's a movable type, it will move.

    let x = {
        let s = String::new();
        // stuff happens with s

        s
    };

Here, s will move out of the block, and into x.

    > more importantly define a lifetime on `Ok(r) => r,` so that the result can escape the block.

Lifetimes cannot extend the life of something, they are descriptive, not prescriptive. So I think you might be trying something that's impossible...

fizzbatter · on May 26, 2016

I hop into Rust beginners from time to time, but it's usually pretty empty so a bit hard to get an answer. I'm very likely explaining the problem incorrectly, so let me give you an example: https://gist.github.com/anonymous/3d72907031ce18d002364e756f...

As you can expect, line 13 is trying to borrow a value that does not live long enough for `req`. Unfortunately, hyper::client::Client's req.body() requires a &str it seems. Somehow i need to pass it a &str (which i would normally do via &String).

Do i need to move s:String out of the block? Do i need to specify a lifetime? These are the sort of "What do i even search for!?" moments i run into heh. Granted, less so these days.. thankfully.

Note that `&mut Some(ref c)` looks pretty terrible, i'm experimenting with an API i'm writing.

steveklabnik · on May 26, 2016

I try to idle in there during NYC 9-5ish at least, but depending on your time zone, it can be tough.

Without compiling myself, it can be a bit tricky, but...

Yes, so the issue here is, your String will go out of scope at the end if the if let, on 14. But you're trying to store a reference to it in something that lives longer: the req will continue to live afterwards. So yes, you need to move s outside of the block somehow; there are a few different ways of doing this. What I'd try first is something like this: https://gist.github.com/steveklabnik/54cf7a4a522cd1a7c6e0130...

Now that the binding lives in the outer scope, it will live longer than the `if let`, you're basically moving it out. I'm trusting the compiler's control flow analysis here; it should let you do this, given that you only use s after you've assigned to it.

fizzbatter · on May 26, 2016

Indeed, that worked! So, it's good to know i had the right idea (moving the data out of the block), but i didn't think to store it above the request body.

I was trying to be clever, and use explicitly say that `s` should live for the lifetime of the parent function. Any idea if that is possible in this case?

edit: Sidenote, i appreciate that you hang in Rust Beginners, i didn't mean to imply that people weren't being helpful. Just that, it can be difficult to find a solution at times.

On the plus side, i managed to help (i hope haha) a guy in #rust-beginners today, so hopefully i've paid it forward a bit. Appreciate your help!

steveklabnik · on May 26, 2016

Yeah no worries :)

  > Any idea if that is possible in this case?

This is what I meant by liftimes being descriptive, not prescriptive: you can't add an annotation and make something live longer. Moving the binding itself is the only way.

openasocket · on May 26, 2016

OK, I see what the problem is. This is the type signature for the body method:

fn body<B: Into<Body<'a>>>(self, body: B) -> RequestBuilder<'a>

The RequestBuilder struct tries to be clever: it just keeps a pointer to what you want the body to be rather than taking ownership of it or copying it. So when you say that "req = req.body(&s)" you've given the request a pointer to your string, but that string is freed inside that if-block. I don't see an obvious fix right now, but I'll update this is I see something.

fizzbatter · on May 26, 2016

Yea, that's another thing about rust. At times, things like what Hyper's Body method is doing can be quite puzzling. And while Rust's attempt to help is noble, it can end up even more confusing. At times it is awesome, but it can be a lot of visual noise for beginners.

Fwiw, https://news.ycombinator.com/item?id=11780196 has the answer to my example problem. I'm also inquiring about using lifetimes to solve the problem.

IshKebab · on May 26, 2016

Because a lot of the time it does make sense and the borrow checker just isn't clever enough to know. For example: https://github.com/rust-lang/rfcs/issues/811

loup-vaillant · on May 26, 2016

Then the borrow checker is just flawed.

I know, I know, no non-trivial checker can be both sound and complete. But if too many useful programs end up being rejected because of that gap, the type system must be changed.

ekidd · on May 26, 2016

My experience with Rust's borrow checker is that it can be pretty frustrating for about a week, and then it mostly stays out of the way, only popping up to report dumb bugs. After that, maybe once per 2,500 lines of code written, I still run into a situation that requires some finessing to pass the borrow checker. It's definitely not perfect, and there are known places where the checker needs to be enhanced (and that work is underway in some cases).

Overall, I'm happy to accept the tradeoffs of the borrow checker. I have to deal with a learning curve and the occasional workaround. But I get fast, correct and expressive code. And even though I like to think of myself as being extremely careful about pointer ownership, the borrow checker still catches real problems in my code that would be a nightmare to debug, especially once threads get involved.

robohamburger · on May 26, 2016

Agreed (though for me it was a bit longer than a week)!

Knowing when to use borrowing over owned copies I think is vital as well. Once I started planning out what was going to own what my code became a lot simpler and I spent less time fighting the borrow checker in general.

Hopefully as rust evolves so will the tooling and the borrow checker. It seems like you could build some cool visual aids and/or debugging info into IDEs for novices who are new. --explain already sort of does this.

jandrese · on May 26, 2016

Mostly, it just trains you to avoid the patterns that cause it problems. As long as this doesn't result in horribly convoluted code it's not really a big deal, it just increases the ramp up time for the language.

oconnor663 · on May 26, 2016

I lot of these cases end up with a workaround like "you need to bind this temp value to a local variable before you use it" or "you need to put this section in curly braces". These things are annoying, and there's lots of room for the compiler to infer more of them, but they don't really get in the way of writing useful programs. In the cases where I've seen the compiler really get in the way, it's because the program wants to do something actually unsound (by Rust's standards), like taking a mutable reference to a list while something else is iterating over it.

bjz_ · on May 26, 2016

This is a perspective that I think we could do better at advocating for when it comes to static typing. It's not really just about safety, it's also about being able to rapidly figure out a good model for your problem through a conversation with the type system. It will not hesitate to let you know when you don't make sense, or you haven't explained things correctly!

bluejekyll · on May 26, 2016

I agree with the statement that Rust has a steeper learning curve, and I love Rust and would choose it for every programming challenge I have from now on.

A strategy that I have reverted to in Rust is to use #[test]'s to mark my progress... I might only write one short function, verify it works with a #[test] then move on. It's almost TDD, but not quite. I started this practice back in my Java days, but have found it particularly beneficial with Rust.

scottlamb · on May 29, 2016

> A type system that rejects your code is like a teacher looking at a proof you just wrote, and tells you "this doesn't even make sense, and here's why". It may be frustrating, but this kind of feedback loop is tighter than what you would get from a REPL.

I've been trying to port a personal project to Rust recently.

Sometimes Rust's type system is too limited to understand why something is safe. Here are a couple examples:

* in C++ code, you can have a class in which one field has a reference/pointer into another which came earlier in the declaration order (and thus will be constructed first and destructed last). This is often a useful thing to do (one example: https://users.rust-lang.org/t/struct-containing-reference-to...) You can't do that in Rust. They have to be separate instance variables on some thread. If there's no thread running to own it, you have to use referencing counting (Arc or Rc) or unsafe blocks. Or maybe instead of keeping a reference, have all calls take the outer struct as a context argument and use some sort of struct/lambda which knows how to find the thing you're referencing given that (this is what I'm trying in my code). Someone proposed a language change for "self-borrowing structs" (https://mail.mozilla.org/pipermail/rust-dev/2014-February/00...) but it didn't go anywhere as far as I can tell.

* in C++ code, you can loop over one instance variable and then call a private method on self which mutates a different instance variable. In rust, you'll get errors about self being partially borrowed. I think you have to restructure the other method to not take self, which probably means grouping things into child structs. Basically Rust doesn't look across functions boundaries to decide if something is safe so it has to consider this an error even if it isn't for the particular method you're calling.

Fundamentally, I think the choice is between these three options:

* use a garbage-collected language and not have to be explicit about these details. There's no possibility of buffer overflows or use-after-free errors but you have to pay the runtime overhead of the garbage collector. Go is clear about the costs involved: pauses up to 10 ms, 25% of all CPU cycles, and 50% of RAM (see http://golang.org/s/go14gc). Other GCed languages likely have similar costs even if they aren't stated as clearly.

* use an unsafe language like C/C++, enforce these things manually in your head and with comments, and occasionally have security problems when you screw up.

* use a safe-but-explicit language like Rust/Swift and have to "show your work" to the compiler quite a bit more, finding a different way if safety is too hard to prove.

masklinn · on June 1, 2016

> * use a safe-but-explicit language like Rust/Swift and have to "show your work" to the compiler quite a bit more, finding a different way if safety is too hard to prove.

AFAIK Swift is semantically in the garbage-collected pile, it uses (statically injected) reference counting.

I believe Chris Lattner has noted they'd left the door open for something like affine types (ownership) in the future, but that's not really their goal at the moment.

Jonhoo · on May 29, 2016

For self-referential datastructures (your first point above), using an Rc or Arc shouldn't have any overhead. I agree that it would be nice to be able to express this, but it's not really that big of a problem.

For partial borrows, there has been some work on it (https://github.com/rust-lang/rfcs/issues/1215), but I agree that this is something that's missing. That said, I very rarely run into this, and there's usually some fairly obvious restructuring I can do to make it work out.

scottlamb · on June 4, 2016

> For self-referential datastructures (your first point above), using an Rc or Arc shouldn't have any overhead. I agree that it would be nice to be able to express this, but it's not really that big of a problem.

I don't see how that could be true. It means using a separate heap allocation for each referenced piece (although the owning_ref thing steveklabnik mentioned might minimize that), a bit of extra RAM for the counter, and a bit of bookkeeping. I'm not saying the overhead is huge, but how could it be zero?

fwiw, I finished this section of my code, and the context approach I mentioned worked out well for me. It was just a bit of a puzzle to find a way Rust would like.

I'm pretty happy with Rust so far even though I've had to restructure parts of an apparently-working program to fit its model. My program is now more obviously correct, benchmarks are pretty good so far (though I wish profile-driven optimization were supported/mature: https://unhandledexpression.com/2016/04/14/using-llvm-pgo-in...), and the open source library situation seems better than C/C++ for what I'm doing and actively improving where C/C++ is stagnant.

steveklabnik · on May 29, 2016

Also, the owning_ref crate can help.

Animats · on May 26, 2016

Well, the functional crowd won. An example expression from the parent article:

    let idx = args
    // iterate over our arguments
    .iter()
    // open each file
    .map(|fname| (fname.as_str(), fs::File::open(fname.as_str())))
    // check for errors
    .map(|(fname, f)| {
      f.and_then(|f| Ok((fname, f)))
        .expect(&format!("input file {} could not be opened", fname))
    })
    // make a buffered reader
    .map(|(fname, f)| (fname, io::BufReader::new(f)))
    // for each file
    .flat_map(|(f, file)| {
      file
        // read the lines
        .lines()
        // split into words
        .flat_map(|line| {
          line.unwrap().split_whitespace()
            .map(|w| w.to_string()).collect::<Vec<_>>().into_iter()
        })
      // prune duplicates
      .collect::<HashSet<_>>()
        .into_iter()
        // and emit inverted index entry
        .map(move |word| (word, f))
    })
  .fold(HashMap::new(), |mut idx, (word, f)| {
    // absorb all entries into a vector of file names per word
    idx.entry(word)
      .or_insert(Vec::new())
      .push(f);

Is there editor support for indenting this stuff?

lifthrasiir · on May 26, 2016

More readable, and IMHO more idiomatic version: (Disclaimer: never tested)

    let mut idx = HashMap::new();
    for fname in &args {
      let f = match fs::File::open(fname) {
        Ok(f) => f,
        Err(e) => panic!("input file {} could not be opened: {}", fname, e),
      };
      let f = io::BufReader::new(f);
      let mut words = HashSet::new();
      for line in f.lines() {
        for w in line.unwrap().split_whitespace() {
          if words.insert(w.to_string()) { // new word seen
            idx.entry(w.to_string()).or_insert(Vec::new()).push(fname);
          }
        }
      }
    }

People who was introduced to the functional approach for the first time seems to enjoy it so much that everything becomes a hard-to-read mess of functions. I had similar experiences with Python list comprehension and C# LINQ.

lmm · on May 26, 2016

You can have it all though - at least, you can in Scala, Rust ought to support something similar. Something like (hybrid syntax/pseudocode):

    (for {
      fname <- args
      f = fs::File::open(fname).orElse(
        e => panic!("input file {} could not be opened: {}", fname, e))
      r = io::BufReader::new(f)
      w <- f.lines().flatMap {
          line => line.unwrap().split_whitespace()
        } .distinct
    } yield Map(fname -> Vec(w.toString))).sum

Just as concise as yours (more concise even), but easier to reason about and safer to refactor. Possibly you might need to do the mutable version for performance if this was a hot loop, but you don't get to even ask that question until you've profiled it and found out that it was the bottleneck; 99.9% of the time it isn't.

bjz_ · on May 26, 2016

That one looks to be more efficient too - all those `collect`s in the original are a tad concerning...

Animats · on May 26, 2016

Do the "collects" mean that memory is allocated and the whole collection stored temporarily, or is this all iterative generators?

(I thought I knew Rust, but I haven't used it for a year, and its usage has changed a lot even if the core language hasn't.)

steveklabnik · on May 26, 2016

collect will ... collect it all into some kind of data structure, so yeah, that version would be doing a bunch of allocation.

k_bx · on May 26, 2016

In my taste, it's also more correct to write "for" in a code which is more like statements (not expressions), and only use map/fold for stuff which is pure.

jupp0r · on May 26, 2016

Sorry, but I disagree. Although more verbose (as in more characters of source code), I could easily understand what the original code did. Your code has a high cyclomatic complexity and I have to keep all those nested for loops in mind when trying to figure out what you are doing.

I write C++ for most of my day job, and this would not pass code review because of readability problems in my team.

gpderetta · on May 26, 2016

I like to use (even abuse) lazy iterators in C++ as the next guy, but IMHO the imperative version in this case is not only shorter but significantly more readable.

The two loops contain a single statement and it is trivial to see what's going on. It is also easier to extend. And it is still using lazy iterators (I assume) where it makes sense, i.e. in the split_whitespace call.

alex_muscar · on May 26, 2016

I cannot upvote you enough.

lifthrasiir · on May 26, 2016

I know there are obviously lots of rooms for improvements. Indeed, if the code would become more complex I would immediately refactor. I can list some problems with my example:

- The error handling should really have been refactored. `try!` would make this easier. (I haven't used it since it is in the `main` function and the code was a direct replacement.)

- Error during reading words is not accurately handled. Again, `try!` would make this easier.

- `words` and `idx` look coupled to each other, which ideally shouldn't have been.

- `words.insert` is quite an opaque method; not everyone is sure if it returns true on a duplicate key or not. I've added a comment but frankly I'm not satisfied of that. <deleted> One alternative is to use `words.entry` instead, which gives a named enum variant. </deleted> Oops, `HashSet` does not have `entry`...

- Ultimately, the body of the outermost loop should go to a function.

That said, do you really think that `line.unwrap().split_whitespace().map(|w| w.to_string()).collect::<Vec<_>>().into_iter()` is a chunk of code which can be read at a glance? It at least has to be named (like T-R's example). Functional approach means that you can split functions and individually review them; the original code, IMHO, didn't.

T-R · on May 26, 2016

I think the issue with procedural loops (not to be critical, just in general) is that there's no abstraction - it's not clear what's getting mutated, or what the result is (or its type), it's harder to look at the individual steps, and it's harder to refactor.

The nice thing about functions like "map", "filter", "fold", "flatmap", etc., is that they describe intent - when you see a "map", you know it's not aggregating things, just applying a function: "map" has a distinct purpose from "fold" (and depending on how pure things are, you may not even be allowed to do anything crazy).

Aside from that, the higher-order functions have pretty intuitive algebraic laws for refactoring (like with "compose" mentioned in my other comment) - It's not clear how you'd factor code out of a nested loop without understanding the whole thing, whereas "flatmap" (a.k.a. "bind" for the list monad) has laws for refactoring it.

lifthrasiir · on May 26, 2016

I agree to you with the general sentiment. That said, Rust is not a functional language per se; the degree of functional decomposition is thus fundamentally limited. This is partly because it tries to be efficient, and as arielb1 pointed out, ownership sometimes makes it worse.

> It's not clear how you'd factor code out of a nested loop without understanding the whole thing, [...]

You are right, loops are particularly hard to refactor. I still argue that my code is better (barring any future expansion) because it fits within handful lines; you cannot easily understand 100 lines of code with the cyclomatic complexity of 1, but you can often easily understand 10 lines of code with the cyclomatic complexity of 8 (e.g. triple loops). The size of code, syntax and contextual information matters as much as algebraic laws.

arielb1 · on May 26, 2016

That part is certainly extreme because of ownership. It can be refactored to

        let mut words = HashSet::new();
        for line in file.lines() {
            let line_words = line.unwrap().split_whitespace();
            words.extend(
                line_words.map(|s| s.to_string())
            );
        }
        words.into_iter().map(move |word| (word, f))

kzrdude · on May 26, 2016

The "idomatic" version (I agree with that) simply works better. You can insert the typical `try!()` calls that short-circuit on error and return errors to the caller.

Retrofitting better error handling in the single expression version requires changing the shape of the whole thing, editing the expression throughout.

ngrilly · on May 26, 2016

Definitely more readable to my eyes.

wott · on May 26, 2016

Yes, it is both much shorter and immediately readable and understandable (without comments!), unlike all the Functional Programming style versions which were proposed in the comments.

T-R · on May 26, 2016

This is maybe true, given that Rust is intentionally multi-paradigm, but it's a pretty unfair comparison: My FP-style version deliberately made almost no changes to the original code aside from adding names - I even stated in the comment that I don't even know Rust (haven't touched it beyond knowing what "Affine types" means, and once having gotten someone else's "hello world" to compile for ARM). The imperative version you're comparing it to completely changes the semantics of the program, and is written by someone very familiar with the language.

ngrilly · on May 26, 2016

Yes. Functional programming is really useful and powerful in some situations, but sometimes the old school imperative programming is what you need. There is a reason why all cooking recipes are written in an imperative style.

agumonkey · on June 1, 2016

The upper map-full code is also badly formatted and has comments everywhere. It's true that chaining functional abstractions is tempting at first; but with a bit of balance:

    let words = fn (f, file) => {
            file.lines().flat_map( |line| { 
                line.unwrap()
                    .split_whitespace()
                    .map(|w| w.to_string())
                    .collect::<Vec<_>>()
                    .into_iter()
            })
            .collect::<HashSet<_>>()    // prune duplicates
            .into_iter()
            .map(move |word| (word, f)) // and emit inverted index entry
        }

    let idx = args.iter()
        .map(|fname     | (fname.as_str(), fs::File::open(fname.as_str()))) //+ (str, fd)
        .map(|(fname, f)| { f.and_then(|f| Ok((fname, f))).expect(&format!("input file {} could not be opened", fname)) })
        .map(|(fname, f)| (fname, io::BufReader::new(f)))                   //+ (str, buf)
        .flat_map(words)                                                    //+ {str}
        // absorb all entries into a vector of file names per word
        .fold(HashMap::new(), |mut idx, (word, f)| { idx
                                                     .entry(word)
                                                     .or_insert(Vec::new())
                                                     .push(f);
        });                                                                 //+ {word:[filename]}

Jonhoo · on May 26, 2016

I think it'd actually be good to show both to show how versatile Rust can be in this regard. I added it to the article with a link back here. Thanks!

T-R · on May 26, 2016

Please, please, please don't associate inlining everything with functional programming. Lambda-lift and name those functions. If you name them, you can ditch most of those comments, too (by moving that information into the names). Not too familiar with Rust syntax, but (to rewrite without changing semantics) something like:

    let open_file = |fname| (fname.as_str(), fs::File::open(fname.as_str()) );

    let check_err = |(fname, f)| {
      f.and_then(|f| Ok((fname, f)))
        .expect(&format!("input file {} could not be opened", fname)) };

    let to_buff_reader = |(fname, f)| (fname, io::BufReader::new(f));

    let line_to_words = |line| {
      line.unwrap()
          .split_whitespace()
          .map(|w| w.to_string())
          .collect::<Vec<_>>().into_iter() };

    let get_file_words = |(f, file)| {
      file.lines()
          .flat_map(line_to_words)
          .collect::<HashSet<_>>().into_iter() // prune duplicates
          .map(move |word| (word, f)) };       // emit inverted index entry

    let add_to_index = |mut idx, (word, f)| {
      idx.entry(word).or_insert(Vec::new()).push(f);
      idx };

    let idx = args.iter()
      .map( compose(to_buff_reader, check_err, open_file) ) //[1]
      .flat_map(get_file_words)
      .fold(HashMap::new(), add_to_index);

[1] Does Rust have a compose function? I hope so. If not, you may need 3 successive "map"s. And stream fusion.

lifthrasiir · on May 26, 2016

> Does Rust have a compose function? I hope so.

Rust itself does not have a compose function. But a closure works fine, so `|fname| to_buff_reader(check_err(open_file(fname)))` would work. If you want to go further, a nightly Rust allows for this kind of construction:

    #![feature(unboxed_closures, fn_traits)] // that's why we cannot use stable (yet)
    
    struct Composed<F, G>(pub F, pub G);
    
    impl<F, G, T, U, V> FnOnce<T> for Composed<F, G>
            where F: FnOnce<T, Output=U>, G: FnOnce<(U,), Output=V> {
        type Output = V;
        extern "rust-call" fn call_once(self, args: T) -> V { self.1(self.0.call_once(args)) }
    }
    
    impl<F, G, T, U, V> FnMut<T> for Composed<F, G>
            where F: FnMut<T, Output=U>, G: FnMut<(U,), Output=V> {
        extern "rust-call" fn call_mut(&mut self, args: T) -> V { self.1(self.0.call_mut(args)) }
    }
    
    impl<F, G, T, U, V> Fn<T> for Composed<F, G>
            where F: Fn<T, Output=U>, G: Fn<(U,), Output=V> {
        extern "rust-call" fn call(&self, args: T) -> V { self.1(self.0.call(args)) }
    }
    
    fn main() {
        println!("{}", Composed(|x| x+3, |y| y*4)(5)); // 32
    }

T-R · on May 26, 2016

> a closure works fine, so `|fname| to_buff_reader(check_err(open_file(fname)))` would work

Ah, yes it would. Clearly I'm a bit too tired to be writing code, if I've overlooked function application. I suppose it doesn't have to be point-free. =)

Pretty cool, though - thanks for the info.

yoklov · on May 26, 2016

Rust is on my to-learn list (and I'll likely have to learn it for work anyway) but dear god what a horror. That is firmly in the same echelon as C++ template madness.

lifthrasiir · on May 26, 2016

While you can do lots of horrible things with generics (cough typenum [1] cough), it has a rather strict rule that prevents it from ever having the same degree of freedom as C++ template has. In particular, this entire thing is correctly type-checked and you won't see hacks like SFINAE. Probably things like this should be encapsulated in a separate crate, reviewed and maintained by the community.

[1] http://paholg.com/typenum/typenum/index.html

tatterdemalion · on May 26, 2016

I guarantee you its a far cry from what templates will let you do.

However, its overloading a custom type to behave like a higher order function, which is a fairly complex piece of code. Most uses of generics are much simpler.

discreteevent · on May 26, 2016

You are correct to point this out (not to associate inlining everything with FP). However a lot of functional code seems to use this style. Maybe people need to point this out more frequently and raise it in code reviews. To me it's like really bad academic English. The kind that uses the passive voice a lot and uses phrases like "the former" and "the latter", instead of naming things and using short clear sentences. Sometimes I'm convinced that the author of the code, if they were honest, would admit that they had difficulty keeping track of exactly what they were referring to while they were writing it.

T-R · on May 26, 2016

Oh, it definitely pops up a fair bit, but it really is just not-well-factored code. It's a direct parallel to having long boolean expressions, or long equations without breaking out any sub-expressions and storing them in named variables - after all, functions are just sub-expressions - which is something Code Complete specifically advocated against in procedural/OOP code.

Some people seem to fall back into it a bit when they discover point-free style (and that seems to make up a lot of what you see in mixed-paradigm code). I don't think it'd be controversial to say it's bad practice in any paradigm, so the association of it with FP is kind of like judging web programming by late 90's beginner PHP code (which, at one point in time, did describe a lot of web programming, but it was never good, and we'd like to put that behind us).

kzrdude · on May 26, 2016

Closure type inference doesn't work well enough in Rust, so your refactoring will not compile unfortunately. Rust needs the closures to be used inline.

bigger_cheese · on May 26, 2016

The syntax doesn't look all that friendly. Even with comments on every second line of the rust code I struggled to work our what it was doing.

I opened the external example he linked which has solutions in a number of languages. (https://www.rosettacode.org/wiki/Inverted_index) I could understand the C version of the code pretty well at a glance. Granted it was long but it looks like the intuitive way I'd write the code. I acknowledge my bias here.

I looked at some other languages on the site and both the the C# example and D example look much more readable then the rust code (and I've not used either of those languages either I think they are intended for "system" programming as well) the code as provided in those languages seemed more intuitive.

Jonhoo · on May 26, 2016

To some extent, I think this comes down to familiarity with the functional approach to solving these kinds of problems. Certainly, before I started writing code in Rust, I also preferred the imperative approach you would take in C and Go. Over time though, I've found that I increasingly prefer this way of expressing chained computation.

That said, there's certainly a point to be made (and indeed other commenters have made it already) that this isn't code you would want to maintain. For production-level code, it'd be broken down more, so that individual pieces could be reviewed and tested in isolation. The code example here was to demonstrate the expressivity of the language more so than the One True Way of doing it.

danieldk · on May 26, 2016

Over time though, I've found that I increasingly prefer this way of expressing chained computation.

This looks like code from the 'I found out about FP, let's apply it everywhere'-stage. For instance, if the iteration over arguments was done with a regular for...in loop, the filename would be visible in its scope and the rest of the code could be simplified due to not having to pass the filename everywhere.

Also, I don't think this demonstrates expressivity well, because the purported functional construct uses unwrap/expect (it panics). The expressivity of Rust allows you to have the whole expression to have Result has its type fairly easily.

(Sorry for ranting a bit, but I think that Not only is this very readable, will tick off people. Which is sad, because it could be changed into something which is more readable but still expressive.)

Jonhoo · on May 26, 2016

I agree with your point in that this is written in an excessively functional style. Something more akin to the imperative-style solution given by lifthrasiir below might have been simpler. However, I don't think that code shows that Rust is more expressive in any way than C/Go is (which is kind of the point).

The functional code I give in the article is quite radically different from what you would/could do in those languages, and I would argue that it does show that Rust provides some interesting and flexible mechanisms that add to the expressivity you have as a programmer. Sure, it's overused in the example, but the exaggeration at least has a chance of communicating the point. Maybe the argument is that I should have used a simpler (yet still not trivial) example instead, but I couldn't think of one at the time.

Re the "Not only is this very readable" part, I completely agree. I could change it to something less, hmm, bold, but I'm not sure it would make that much of a difference at this point. Suggestions are welcome.

pjmlp · on May 26, 2016

I remember reading about a nice post from someone in the Clojure community that de constructed an almost unreadable program full of nested function declarations into something that any of us would be happy doing maintenance.

It went thought the code identifying code patterns and goals, and then making them into separate functions or data structures as appropriate.

pcwalton · on May 26, 2016

I noticed that a lot of the Rust programs on Rosetta Code use the functional style as well. You could do that in C# and D too if you wanted—it's just that the people adding Rust entries to the site prefer functional style.

Rust does not force you into the functional style, and usually I use imperative style when writing Rust code, except when the task is so simple that writing e.g. a for loop would be just noise. For example, if I just wanted to convert every string in a vector to uppercase, I'd probably just write "list.iter(|string| string.to_uppercase()).collect()". But if I were, say, hashing every string in the vector, I'd probably use a for loop over an iterator.

pjmlp · on May 26, 2016

If I would be using Rust I think I would also be in the FP crowd.

There are enough similarities with ML languages, and I have used so much FP concepts all the way back to Caml Light, my first FP language, that I would just naturally follow that path when writing Rust.

In Smalltalk it was just natural to mix both styles.

ngrilly · on May 26, 2016

> Rust does not force you into the functional style, and usually I use imperative style when writing Rust code, except when the task is so simple that writing e.g. a for loop would be just noise.

I use the same strategy in "dynamic" languages like Python and JavaScript, that support both an imperative and functional style.

smitherfield · on May 26, 2016

A lot of stuff in the article's code examples struck me as "hacky," although I have no idea what's considered idiomatic/performant Rust.

EugeneOZ · on May 26, 2016

It's just question of time, I promise.

pcwalton · on May 26, 2016

I wouldn't say "the functional crowd won"—that's certainly not how I would write the code, and it's not the style that high-profile projects like Servo typically use. Rather, the author clearly prefers functional style, and Rust (like most modern languages) allows you to write in that style if you wish.

Jonhoo · on May 26, 2016

Of course. This example was given to show expressiveness. For production-level code, you would break it up more so pieces can be tested individually (as pointed out in another comment).

devin_lane · on May 26, 2016

I dunno, I wouldn't say this is as readable as the author mentions. To me, all of the chaining, syntax for closures, and Rust concepts like "into_iter" work to hide the actual algorithm, which is pretty simple. And forget testing, since this is all actually ONE top level call. I certainly wouldn't want to maintain code written like this in any language (and yeah, while you could make pretty much the above in C++ if you really wanted, I wouldn't write it that way.)

danieldk · on May 26, 2016

How I would've written it. Rust beginner, but 'been there, done that' in FP:

https://gist.github.com/danieldk/3bd3b84c1a7c8bc8c902314a488...

(Compiles and seems to work properly on a small document collection.)

Chris2048 · on May 26, 2016

> // iterate over our arguments

> .iter()

hmmm. Not far off the old

> i++ // add one to i

tatterdemalion · on May 26, 2016

These are quite different. `.iter()` is equivalent to creating the counter and declaring both the increment and conclusion condition, only its guaranteed to correctly access each member of the collection once and to never be broken, and for many collections (e.g. arrays) it is written in a way which allows for powerful optimizations.

Chris2048 · on May 27, 2016

I mean in terms of comment useful-ness.

anp · on May 26, 2016

Rustfmt can clean that up, but I'm on mobile and can't run the examples through to see what it looks like.

Jonhoo · on May 26, 2016

If you compare that to many of the other solutions on Rosetta Code (https://www.rosettacode.org/wiki/Inverted_index), that's a pretty neat solution! And yes, as pointed out below, rustfmt would clean this up (mostly by increasing the indentation). I reduced indent to double spaces to make the code denser for the post.

conradev · on May 26, 2016

Rustfmt generally handles this well, yes.

Munksgaard · on May 26, 2016

> This latter point is particularly interesting; the Rust compiler will not compile a program that has a potential race condition in it.

I feel obliged to point out that this is false. Rust prevents _data races_, but not _race conditions_. You can read more in the Rustonomicon here: https://doc.rust-lang.org/nomicon/races.html

Jonhoo · on May 26, 2016

In my defense, the next sentence is "Unless you explicitly mark your code as `unsafe`, your code simply cannot have data races." That said, I've updated the article text to now say data races in both places.

0xmohit · on May 26, 2016

Good to see such articles that provide an insight into various aspects of a programming language.

A couple of other beginner-friendly resources would include:

- An alternative introduction to Rust [1]

- 24 days of Rust [2]

- CIS 198: Rust Programming [3]

[1] http://words.steveklabnik.com/a-new-introduction-to-rust

[2] http://zsiciarz.github.io/24daysofrust/

[3] http://cis198-2016s.github.io/

joobus · on May 26, 2016

I'd like to know what the author considers "systems work"; I don't consider garbage-collected languages (Go, Python) "systems" languages.

pjmlp · on May 26, 2016

Then you are ignoring all the great work done at Xerox PARC, DEC/Olivetti, Royal Signals and Radar Establishment, ETHZ, AT&T, Microsoft Research which used garbage-collected system programming languages.

If the whole OS stack can be implemented in the language, ignoring the Assembly help that even C requires, it is a systems language regardless if it has a GC for heap management or not.

adrusi · on May 26, 2016

By that criteria neither Go nor Python can be considered systems languages. Maybe you could implement some basic code to bootstrap CPython in Cython or RPython, I'm not entirely sure, but neither of those are really Python. The only garbage collected languages that I'm aware of that can implement a full OS stack are LISP, Red and maybe Smalltalk.

pjmlp · on May 26, 2016

I wonder why you are disqualifying Go, given that it is fully boostraped as of version 1.6, yes even the GC is written in Go.

It is also a descendant of Oberon, which was used very much successfully at ETHZ during the 90's.

Every time I have this argument I wish some PhD student bothers to rewrite Native Oberon in Go.

Also you should educate yourself, a small list of GC enabled systems programming language is:

Lisp, Algol-68RS, Mesa/Cedar, Modula-2+, Modula-3, Oberon, Oberon-2, Active Oberon, Component Pascal, Sing#, SystemC#, D.

There are plenty of ACM, SIGPLAN, digital archive papers from the OS written in those languages.

nickpsecurity · on May 26, 2016

As pjmlp says, Oberon systems were implemented in a GC language. The latest, A2 Bluebottle, runs fast and light despite being written in a safe, GC, system language. Easily more responsive than my Linux box. There's also GC's that can support latencies good enough for soft, resl-time applications.

Far as Go, it's designed to recreate the Oberon-2 experience of Pike. It's basically a modified Oberon. If Oberon can do OS's, then Go can do OS's with some modifications to runtime or something.

pjmlp · on May 27, 2016

I guess we should start forwarding Go as system language disbelievers to

https://github.com/jjyr/bootgo

http://wiki.osdev.org/Go_Bare_Bones

Although the Wiki author doesn't seem to understand how to use the unsafe.Pointer type, and makes use of external Assembly instead, but the bootgo does it properly.

nickpsecurity · on May 27, 2016

Great find! It's not quite an OS. It does demonstrate a system booting into kernel-mode, Go code. That's a start. Re-writing... even auto-translating... the Oberon system to Go still seems like best proof.

I just sent the Github author an email suggesting it as his next project. Plus a little praise for giving us the needed ammo. :)

adamnemecek · on May 26, 2016

You are probably referring to the sentence "Rust is quickly becoming my favorite language for all systems work (which is most of what I do anyway), and has largely replaced both Go, Python, and C/C++ in my day-to-day."

One could also interpret is as "Rust is quickly becoming my favorite language for all systems work [...], and has __also__ largely replaced both Go, Python, and C/C++ in my day-to-day."

Jonhoo · on May 26, 2016

That is indeed how I meant it. That said, Go is certainly being used as a systems language, and especially in the domain of networked services, it is rapidly replacing C/C++.

jguegant · on May 26, 2016

"and especially in the domain of networked services, it is rapidly replacing C/C++."

Rapidly? Where?

pjmlp · on May 26, 2016

Thanks to the adoption of Docker, Dropbox and other well known SV companies.

Even Microsoft is now contributing to the eco-system due to their collaboration with Docker. Also Joe Duffy (from MSR Midori) tweets about his Go experiences.

Personally I prefer other languages, but I am surely happy to see more Go and less unsafe languages being used.

IndianAstronaut · on May 26, 2016

Go also has the advantage of a simple syntax like Python. Moving some of our ETL processes to Go from Python has been quite easy.

Jonhoo · on May 26, 2016

I don't have any numbers to support this claim, but it's certainly true that we are seeing a lot of software being developed in Go, and network software in particular.

ngrilly · on May 26, 2016

Examples of services that typically would have been written in C/C++ but are written in Go: Docker, YouTube Vitess (a MySQL proxy), CockroachDB (a distributed and strongly consistent SQL database).

Other examples of companies using Go for network services: Dropbox (most backend), CloudFlare (Railgun and other things), Uber (geofencing), Bitly (messaging with NSQ), Disqus (realtime commenting), DigitalOcean, Heroku, SoundCloud, Dailymotion.

ngrilly · on May 26, 2016

There is also Netflix using it for a proxy:

> The decision to use Go was deliberate, because we needed something that had lower latency than Java (where garbage collection pauses are an issue) and is more productive for developers than C, while also handling tens of thousands of client connections. Go fits this space well.

Source: http://techblog.netflix.com/2016/05/application-data-caching...

Discussion: https://www.reddit.com/r/golang/comments/4l0fv2/the_netflix_...

Scarbutt · on May 26, 2016

replacing python is a bit of stretch no? I'm sure you can, but is it practical? like for quickly hacking stuff up.

Jonhoo · on May 26, 2016

Actually, no. I've found that I'm maybe even more productive in Rust than I am in Python, even for one-off tasks. The one exception is probably plotting, but I'm sure a Rust library for that isn't far off.

jhasse · on May 26, 2016

What's your replacement for Python REPLs like bpython or ptpython?

Jonhoo · on May 26, 2016

I barely, if ever, use/need a REPL. The shell has all the tools I need for short, one-off tasks.

Scarbutt · on May 26, 2016

Impressive, I'll have to check it out :)

pjmlp · on May 26, 2016

Once upon a time I used Python for systems administration tasks and that was a long time ago, around 2003.

Python doesn't offer much against languages that compile to native code, have type inference (if strong typed) and REPL.

The only reason for me, would be if I am required to make use of a framework or library that requires me to use Python.

lmm · on May 26, 2016

I use Scala for that kind of use case sometimes. If you have a REPL and you have good enough type inference you can pretty much just do what you would do in Python, and as a bonus you can reuse library functions from your real system rather than having to duplicate or shell out to them from your scripts

IndianAstronaut · on May 26, 2016

I find getting stuff up and running with Go is quite easy. Just set up a small package with some quick hacks and then a separate file to run the package functions.

sanderjd · on May 26, 2016

"Systems" has come to include things like networked services (eg. HTTP and DNS servers), which you can certainly do successfully in GC'd languages in many circumstances. Personally, I'd just like to see a widely agreed upon definition, so that we can all stop being confused when this comes up.

LionessLover · on May 26, 2016

Naming confusion is completely normal. Even in fields like learning human anatomy you'll learn different words for some things depending on who your teacher is, even though one would think that there should have been sufficient time for such issues to settle and the professionals agree on one word. But naming always is an accident of space and time - where did it happen, when did it happen, and then the path from there. Language is dynamic and a little fuzzy - in the sciences as well. You can define all you want, the problem always is other people, who choose their own definitions. It's fun - it keeps you on your toes :-)

In this context, for example, I don't see a reason for a world government to impose one definition of "systems programming" worldwide by military force (this is what it would take!). Just like in almost everything in human language, you get it from the context. You should try to teach a computer to recognize human speech including the meaning, then you'll realize that almost all of it relies on context and pre-existing knowledge. "Precision" comes from the interpretation - human language communication is the nightmare of people who love functional programming, it's full of hidden state and context and active interpretation.

Which is why it's so easy for this to happen: http://dilbert.com/strip/2015-06-07 If someone wants to argue, there is no way to provide a water-tight human-language text that "Dick from the Internet" can't attack. There always is a way to mis-interpret human language.

sanderjd · on May 26, 2016

Thanks, this is a great response. I shall henceforth embrace the fun and swear to never again be lured by the superficial temptations of naming hegemony!

pjmlp · on May 26, 2016

Here is the Project Oberon source code, implemented in a GC enabled systems programming language.

http://people.inf.ethz.ch/wirth/ProjectOberon/index.html

All the way from the hardware Verilog to the GUI.

This is the 2013 re-edition of the initial project. Later versions of the operating system with its 3D Gadgets framework were quite nice to use.

It doesn't get more systems programming than that.

nickpsecurity · on May 26, 2016

Oh no, it gets better: FPGA-Oberon uses an Oberon variant to straight-up synthesize hardware.

http://vbn.aau.dk/ws/files/58355126/main.pdf

So, one can do Oberon all thd way down until we need custom cells. ;)

pjmlp · on May 27, 2016

Cool I didn't knew it.

AnimalMuppet · on May 26, 2016

I suspect that "systems" is on the way to meaning "what you'd use to write anything other than CRUD web apps".

adiabatty · on May 26, 2016

One popular, reasonable definition for "systems language" is "used for writing programs that don't have a GUI" (other than web-based ones).

Another popular, reasonable definition for "systems language" is "something one could write an OS in".

samuellb · on May 26, 2016

I'd call a systems programming language "something which can use the low level ABI of the underlying platform". So C and e.g. FreePascal would fit the description (for most hardware platforms). And Java, for instance, would not unless the platform runs Java bytecode.

unimpressive · on May 26, 2016

I think the latter is a more reasonable definition than the former, since I've written plenty of command line applications in python. And as much as I like python, if it qualifies as a 'system language' than just about anything does.

Perhaps it could be amended "Primarily used for writing programs without a GUI."

Jonhoo · on May 26, 2016

I'm slightly hesitant to agree with both of these. The first seems wrong to me, because Bash qualifies, whereas Tcl doesn't. The second is better, though I think it excludes some interesting languages, like Scala, which you probably wouldn't write an operating system in, but that you could certainly build "systems" in (e.g., Spark).

accnumnplusone · on May 26, 2016

This discussion keeps coming up in the Go world. As with many words "systems" has multiple uses eg operating systems and enterprise systems. I don't understand the confusion.

otoburb · on May 26, 2016

The author wrote further down in the article under the heading "Performance Without Sacrifice":

>>Unfortunately, higher-level languages are often not a great fit for systems code. Systems code is often performance critical (e.g., kernels, databases), so the developer wants predictable performance, and tight control over memory allocation/de-allocation and data layout. This can be hard to achieve in higher-level languages or when using a garbage collector.

ustolemyname · on May 26, 2016

Others don't consider languages which do not handle OS signals a "systems" language: https://github.com/rust-lang/rfcs/issues/1368

pjmlp · on May 26, 2016

Which also disqualifies C, because many of the language features that the C crowd uses to argue against GC enabled systems programming aren't part of the C standard, but rather compiler extensions.

In a strict ANSI C compliant compiler, signal handling is implemented in Assembly.

A systems programming language is one one can be used to write a full OS stack, regardless how heap management is done and how much Assembly help is needed.

kibwen · on May 26, 2016

I'm unsure what this is implying. Rust is fully capable of handling signals, it just doesn't provide any nice APIs for such in the standard library.

SiVal · on May 26, 2016

Is there a roadmap for what is currently planned for the standard library? A lot of language choice these days comes down to what high-quality batteries are included and what will require either trusting some unknown 3rd party's github project or your own homemade version.

Gankro · on May 26, 2016

The first-party stuff in the pipeline is basically: https://github.com/rust-lang-nursery

Off the top of my head, Time and SIMD are the only two first-party efforts that are missing from there.

Things which will, at this rate, probably never be added -- either because there isn't a "standard" solution to these problems or because it doesn't seem worth it:

* async io

* web framework/http

* numerical hierarchies and bignums

* linear algebra / stats

* GUIs

* "human" time / calendars

kibwen · on May 26, 2016

Eh, considering that TCP support already exists in the standard library, I could see HTTP also being added someday if there were sufficient demand.

Bignums in general are a bit too open-ended, but I could see bigints at least being added someday.

And as for async io, I think it's the same story as with HTTP: imaginably included contingent on implementation maturity and user demand.

Jonhoo · on May 26, 2016

Not entirely sure what you mean here? Even Bash can handle (some) OS signals, but I don't think anyone considers that a systems language?

ustolemyname · on May 26, 2016

Well, maybe not Bash, but it seems a dozen years ago referring to Perl as a systems language would not have caused anybody to blink ;)

I apologize for how glib my remark above was, it didn't add much to the conversation, and was more a knee jerk response to the notion that "Systems Language" is now, or ever has been, well defined.

I brought up the handling of signals, because process management is, in practice, a pretty big deal in what I would consider systems programming. But I don't think it's important to all systems programming. Much like I don't think avoiding garbage collection is necessary for all systems programming, though it's clearly an issue for some applications.

What I do find odd is the notion that it is necessary to not have garbage collection, but not having a solid story behind handling signals from the OS receives a pass.

Rust can handle signals, but its implementation is very platform specific (even the currently recommended crate lacks Windows support), and ultimately calls out the FFI. So it seems that saying Rust has support for signals is like saying Go supports manual memory management because you can call C.malloc and C.free...

Jonhoo · on May 26, 2016

Author of the post here. Curious that this got posted again. Was originally posted as https://news.ycombinator.com/item?id=11773332. Can the posts be merged by a mod somehow?

jeffdavis · on May 26, 2016

I really like my experience with rust so far also, but a few caveats:

* try!() Is pretty annoying

* Working effectively with C in non-lexical ways seems to involve some unstable libraries and still requires nightly rust

* Macros are safer, but can't do some things that C macros can. For instance, they are hygienic, which means you can't conjure up new identifiers. For that, you need a syntax plugin, which is very powerful but the APIs aren't stable yet. This goes to the previous point.

* A few annoyances, like warning when you don't use a struct field as "dead code". If I'm interfacing with C I probably need that struct field whether the rust compiler sees it or not, but I don't want to disable all dead code warnings for that.

tatterdemalion · on May 26, 2016

You can attach the allow(dead_code) attribute to the struct - or even the individual field - which will scope it to that declaration only. Still annoying when you have a lot of them, but better in my opinion than not having dead code warnings on fields, since the use case you describe is the minority of Rust structs that are declared.

jeffdavis · on May 26, 2016

Maybe there should be an annotation saying that the layout of a struct is important for ABI compatibility, and it would silence warnings related to the layout.

tatterdemalion · on May 27, 2016

C FFI structs have to be marked `#[repr(C)]` to guarantee they'll be represented how C structs would be. It might be a reasonable change to implicitly allow dead fields on any struct with a C repr (since using the struct in C effectively makes all of its fields public).

Sean1708 · on May 27, 2016

Which unstable libraries are you thinking of?

I think you can silence the dead code warnings by either marking the field `pub` or prefixing it with an underscore.

kjaleshire · on May 26, 2016

The `?` operand appended to function calls makes them behave like `try!()`. It's only in nightly currently, on its way to being stabilized.

jeffdavis · on May 26, 2016

Awesome, I didn't know that got committed! Hopefully stable soon.

zimbatm · on May 26, 2016

Is rust ever going to re-introduce the N:M model again ? For services which need to handle 1M connections the system threads are too expensive and mio brings back the callback hell.

lifthrasiir · on May 26, 2016

Have you tried mioco [1]? It looks fine enough to avoid the callback hell.

[1] https://github.com/dpc/mioco

zimbatm · on May 26, 2016

Thanks. So no clear winner yet. Does libraries have to be re-written to support the coroutines ?

GolDDranks · on May 26, 2016

MIO is a very low-level library (as it tries to be as zero-overhead as possible), and there's multiple nicer abstractions written on top of it. Check this: https://github.com/carllerche/mio#libraries

eddyb · on May 26, 2016

mio has nothing to do with callbacks in its design, unless you build such an abstraction yourself on top of it.

However, without an ergonomic way to create state machines (i.e. generators), it's hard to use at all in the intended fashion (small per-connection state instead of large boxed closures or coroutine/thread stacks).

Jweb_Guru · on May 27, 2016

Rust's M:N design was not going to scale to 1M connections, and it's hugely unlikely that you need 1M simultaneous connections.

georgewsinger · on May 26, 2016

If I'm not hacking on something super low-level, like hardware or an OS, then should I still try Rust? Why not stay within super high-level/expressive programming languages like Haskell/clojure?

I ask because a lot of extremely smart people I know like Rust.

nercury · on May 26, 2016

Rust is good.

Great tooling: Libraries can be written, published and used quickly. Code can be easily rebuilt on upcoming Rust versions without uninstalling anything. Integrated testing and documentation generation. Cross-compilation.

Truly cross-platform. As an example, Rust even supports both GNU and MSVC targets on Windows, that means Rust libraries can be linked into the C++ projects compiled with MINGW or MSVC. All the standard library features are cross-platform, unless namespaced otherwise.

Rust linear type system requires variables be either uniquely mutable or immutable, but not both. This solves whole class of resource management problems (memory being the most important) in the simplest possible way. This compile-time guarantee also works for all references to data, which become thin pointers at runtime. This, together with Send-able and Sync-able types, also ensures no data races.

Language is up-to-date. No-nulls, pattern matching, closures, generics, attributes will feel familiar to programmers coming from various languages. Language had many iterations and changes, and every bit of it was designed with care. Right now it may be a bit explicit, but sugar can always be added later if it appears worth it.

Stable release cycle, stability guarantee. We know when new beta will be renamed to stable and we know what's in it! That means there is time to check all ecosystem for possible breakage, even though historically Rust introduced hardly any breaking changes since 1.0 a year ago.

steveklabnik · on May 26, 2016

Rust still tries to be expressive, even with its low-level nature. I have heard of people saying that they have started writing Rust instead of Python, because they feel like it's close enough in expressiveness most of the time, yet much faster.

Mihies · on May 26, 2016

One thing I am missing is dependency injection/ioc. How does one effectively unit test without it?

lmm · on May 26, 2016

Inject dependencies via constructor parameters - this is best practice in any language. Unit testing you can just... pass in the values you want.

bwindels · on May 26, 2016

You can, of course, always do Pure/Poor man's DI [1].

[1] http://blog.ploeh.dk/2012/11/06/WhentouseaDIContainer/

Jonhoo · on May 26, 2016

You can still do dependency injection in Rust..? There even exists a create for it: https://github.com/Nercury/di-rs. There's also some further discussion here: https://users.rust-lang.org/t/how-do-you-implement-dependenc...

Mihies · on May 26, 2016

I saw that but it seems really complicated for something that should be straightforward. Or it just looks so? (coming from c#)