Is there some algorithm in Rust which has a necessary big-O? There were compilers in the 90s which ran a million lines per second (on much slower computers). Incremental compilation feels like working around the problem rather than solving it, and I have to imagine the complexity and maintenance is much worse.
I'm sure some of the LLVM optimization passes are expensive, but those are used in clang++ too, and it's not terribly slow.
> There were compilers in the 90s which ran a million lines per second (on much slower computers).
Those compilers performed nowhere near the level of optimization that modern compilers do.
> I'm sure some of the LLVM optimization passes are expensive, but those are used in clang++ too, and it's not terribly slow.
clang++ is sure slow if it has to rebuild an entire codebase from scratch.
The main reason that C++ compilers get away with not having partial compilation is that it forces programmers to do the division into code units manually through separate .h and .cpp files. Rust doesn't have this, except at the crate level, so it needs partial compilation instead. It turns out that partial compilation is better in the first place, because compilation units tend to be fairly coarse-grained: .cpp files frequently grow very large.
> Those compilers performed nowhere near the level of optimization that modern compilers do.
I think this gets at Wirth's law a bit. Yes compilers perform lots of optimizations now, but in Rust's case it is at least partly because they need to in order to claw back the performance that the Rust gives up with its abstraction (as seen in debug builds).
I do much prefer Rust's safe abstractions to writing more direct code and hoping I get it right. I hope (and do believe) that the MIR effort will somewhat rein in the burden shunted over to LLVM, as it is super painful for some of us.
I think Proebsting's law is also relevant. Compilers did an impressive job of optimization in the 90s, and while yes there have been advancements in the last 18 years, it's foolish to over-estimate the progress.
Also, considering how much hardware changes have improved performance (speculative execution, register renaming, etc...), it wouldn't surprise me if many compiler optimizations have become less important with time. For instance, peep-hole instruction scheduling has to matter less than it once did.
clang++ is slow, but it certainly feels faster compared to rustc (and I typically use the write everything in one include C++ style). I’m curious, has there been any outreach to the clang community about rustc performance issues?
Is it they who didn't "budge an inch", or you, that despite a reasoned explanation, insist on the original accusation?
Not to mention the underlying insinuation, that the Rust compiler developers are dumb for not being able to get at the 80s compiler's level of speed -- since you don't seem to accept that this is a byproduct of the advanced optimizations and security guarantees they do.
My criticisms are meant to point out areas for actual improvement.
Try to be scientific. If you've got a conjecture, test it and see if it holds water. One excuse was that C++ is only faster because people build C++ projects with separate compilation units. Well, you can compile a project in C++ with a single compilation unit, and it's not horribly slow. Now you need a new conjecture.
The type system is comparable to Standard ML from the 70s. The lifetime analysis is a solid improvement over mainstream languages, but I have a tough time believing it needs to be an order of magnitude slower than const correctness. The optimizations are on par with C++ and Fortran (mostly because they're leveraging the guts from an existing C++ compiler). SML, C++, and Fortran do not compile as slowly as Rust. There's probably another reason.
I don't think the Rust compiler developers are dumb, and I never said that. I think several of them are brilliant but young (and occasionally very arrogant).
Rust is so close to being a great replacement for C++. I read every one of these threads about new releases hoping they'll eventually fix the problems. Compilation speed and memory use are lesser issues, but they're still important, and things like incremental compilation dodge the issues instead of fixing them.
Borrow checking is a lot more involved than const correctness. It's not even in the same league. Const correctness just adds a few more types, whereas borrow checking involves generating sets of constraints and solving them with a constraint solver, something C++ doesn't really have to do at all.
The type system of Rust goes significantly beyond that of Standard ML, because of traits/typeclasses among many other features. Also, Standard ML is usually not implemented with monomorphization, which affects compile time.
> Compilation speed and memory use are lesser issues, but they're still important, and things like incremental compilation dodge the issues instead of fixing them.
Incremental compilation is a way of addressing the compile time and memory usage issues. It's just not the way you seem to want them to be fixed. Nobody did incremental compilation for fun; it was done because without it we didn't believe we could reach our compiler performance goals.
Again, you're asking for a magic bullet, and I'm saying it's not likely that there is going to be one. If one existed, it probably would have been found already. There's lots of hard work that has been done, and lots of hard work ahead.
If I had to guess what the most important issue is, it's that idiomatic Rust generates a lot more LLVM IR than idiomatic C++ does, because it leans on abstractions for safety and developer ergonomics. Think about how many more closures Rust code uses, what for loops expand to, what pointer arithmetic compiles to (ptr::offset), etc...
I'm not going to dive into your points about constraint solvers, type classes, or "monomorphization". Unless those things take more than a few percent of the compile time, they're just more red herrings.
> Incremental compilation is a way of addressing the compile time and memory usage issues. It's just not the way you seem to want them to be fixed. Nobody did incremental compilation for fun.
I'm well aware that my opinion doesn't matter, but I doubt incremental compilation will make any real strides towards making things as good as they could be. It sounds like the real problem is here:
> If I had to guess what the most important issue is, it's that idiomatic Rust generates a lot more LLVM IR than idiomatic C++ does
I suspect every single feature in idiomatic Rust has a one-to-one translation to some bit of C++ that isn't too horrible. You can do lambdas, iterators, generators, or whatever else it takes to do Rust style for loops and closures in C++, and it won't choke the compiler. So while I don't doubt rustc is generating a lot more IR, I doubt it needs to.
Maybe the MIR optimization pass will move in that direction.
I'm simply pushing back on the idea that there's some obvious low-hanging fruit that clang did that rustc didn't. Compiler performance has more or less much been the #1 goal for well over a year now. The Rust compiler team (which hasn't included me for years, by the way) is fantastic. We'd all like for there to be some magic bullet that makes the compiler faster, but at this point I don't believe it exists. It's hard work from here on out.
> I'm simply pushing back on the idea that there's some obvious low-hanging fruit that clang did that rustc didn't. [...] magic bullet
I didn't suggest low hanging fruit or a magic bullet. If anything, incremental compilation is being promoted as that kind of thing, and I just wanted call it into question.
Nobody thinks incremental compilation is a magic bullet! The article specifically says "This is still not the end story for compiler performance generally, nor incremental compilation specifically."
It's not ironic. I see this kind of thing all the time. Rather than cleanup and speedup an implementation, engineers will make it multi-threaded or send it off to a GPU, generally achieving limited improvements and greater complexity. In this case, rather than make the compiler faster, they hack it up with incremental compilation. It's still slow, but they apply the slow code to less data and call it a win.
Nobody who reads the rustc commit history would agree that this is what is going on. The number of straightforward changes that are constantly made in the service of small performance improvements far outweighs the work on incremental compilation.
The memory safety checking (or more generally, the type checking) isn't what makes the compiler slow. That part is pretty fast because it's intentionally modular.
It's the cross-module optimizations that make the compiler slow. Those optimizations are in service of C++-inspired ideal of being able to use layers of abstraction to make your code cleaner, but having the compiler flatten them down so your program still runs fast.
It's not a big-O thing. The optimization passes aren't necessarily huge either; right now, 50% of the time is in LLVM compiling IR -> machine code. The static checks, optimization passes, and everything else are minuscule overall. You can see this with -Z time-passes.
We have some hunches, but nothing super conclusive yet; basically, right now it's all about how much IR we generate.
Don't forget the differences in compilation model; C/C++'s compilation units are much smaller, so incremental isn't as big of a deal.
Is there a way to generate IR which is easier on LLVM? The Jai compiler compiles 60kloc with an llvm backend in seconds. Could performance like this be seen in rustc?
> Don't forget the differences in compilation model; C/C++'s compilation units are much smaller
Not everyone compiles C++ the same way. I prefer header-only libraries and a single compilation unit. Counting lines of code with "gcc -E" to get the headers, I just compiled 20,000 lines of C++ from scratch in .6 seconds with gcc and .4 seconds with clang. I don't have a comparable project in Rust.
No idea, maybe a lot from the system headers. However, it was 20,000 lines of code that survived the preprocessor, 11,000 of which were inline or template from the project itself.
- A large fraction of compiler time is spent on code generation; how many of those lines did the compiler actually generate code for? (i.e. not unused functions or uninstantiated templates)
(And in the case of uninstantiated templates, a C++ compiler doesn't even need to type check them…)
> how many of those lines did the compiler actually generate code for?
Not many. How many lines would a comparable Rust program have to instantiate? (should be the same)
All I was trying to do was make an apples to apples comparison by discrediting the implication that the clang++ and g++ compilers are only faster because people use separate compilation units.
I think it's fair to compare Rust and C++ this way, and if Rust is slower to compile, then it's fair to ask why.
> Did you have optimizations enabled?
Yes, -O3. Also -Wall -Wextra and -fwrapv if you're interested.
Oberon certainly compiles quickly, and I believe Delphi (Pascal-ish) was fast too, but I was thinking of Microsoft's Java compiler before they got spanked.
Java without generics used to compile pretty fast.
When you add generics, lifetimes, and type inference, the amount of work the compiler does grows significantly. It allows to check much more interesting invariants, leading to the "if it compiles, it runs correctly" effect.
Basic generics in rust don’t have that much overhead; it sounds like your conflating generics with templating. Generics aren’t Turing complete, unlike C++’s templates.
Other languages with powerful generics and type inference like modern C# clearly manage just fine, too.
Yes, but things like `fn do_something<T>(t: T) -> impl Future Where t: Serializable` definitely take a while.
And C# has the added benefit of being JIT'ed, it doesn't have to flatten all the generic code during compilation, it can optimise later.
In my experience (with Rust programs that take 20mins+) it is all about the monomorphisation of generics. The great majority of the time is spent in LLVM, because Rust hands over many GB of LLVM IR for optimization, having done not very much (edit: in the way of code transformation) other than direct instantiation of the generic parameters.
Rust does have the opportunity to do more, and I suspect they are hoping to do more with MIR, but at the moment they rely on LLVM to optimize out all of the zero-cost abstractions that they introduce.
If you're referring to the comment starting with "It's not a big-O thing," I think that comment says the opposite. "Generics, lifetimes, and type inference" aren't "static checks and optimization passes" (within rustc)—he says the bulk of the work is LLVM dealing with the quantity of IR rustc generates, which is exactly what you'd expect from having heavy source-level abstractions like generics and type-heavy patterns like libstd's approach to iterators that all need to be compiled out.
Edit: so, in the bit below about MIR vs non-MIR borrow checking, I went and asked niko. And he told me that -Z time-passes is pretty much misleading now. Gah. I'm leaving the comment below because I put a lot of work into it, but apparently it may not be correct.
https://github.com/rust-lang-nursery/rust-forge/blob/master/... is how you're supposed to do this these days. It's a ton of work that I don't have time to do right now, but if you look at the example HTML output linked there, for a hello world, my broader point still stands, which is "translation 78.1%". That is, taking the MIR and turning it into machine code, first through LLVM IR, is the vast, vast majority of the compile time. Not the safety checks, not some un-optimized algorithm in them.
This is for expanding out macros, over half a second of time.
time: 0.338; rss: 169MB coherence checking
I'm sorta surprised this takes even a third of a second; it's never come up when I've looked at these results before. Coherence is the "you can only implement a trait for a type if you've defined at least one of them in your crate".
time: 0.596; rss: 205MB item-bodies checking
It's early so maybe I'm slightly wrong, but I believe this pass includes the type inference stuff, because that's only done inside of function bodies. 6/10ths of a second isn't nothing, but...
This is actually something I'm quite surprised by. Right now, we borrow-check twice, since we're still working on the MIR-based borrow checker. I'm not sure that MIR borrow checking is supposed to be this much slower; I've pinged the compiler devs about it. Regardless, before this was introduced, we'd have shaved 2.5 seconds off of the compile time, and after the old one is removed, even if the borrowcheck doesn't get faster, we'd be shaving off half a second.
Then we get into a ton of LLVM passes. As you can see, most of them take basically no time. Ones that stick out are mostly codegen passes, which is what I was referring to with my comments above. But then we have:
time: 5.625; rss: 413MB translate to LLVM IR
Even just turning MIR into LLVM IR takes five seconds. This completely dominates the half and third second times from before. In all:
time: 7.418; rss: 415MB LLVM passes
So, all the other passes take two seconds out of seven, combined. Ultimately, none of this is Rust's checks as a language, this is burning away all of the abstractions into lean, mean code. Finally,
time: 9.437; rss: 414MB translation
this is the "turn LLVM IR into binary" step. This also dominates total execution time.
Note that all of this is for an initial, clean build, so a lot of that stuff is setting up incremental builds. I deleted src, git checked it out, and then built again, and got this: https://gist.github.com/steveklabnik/1ed07751c563810b515db3f... way, way, way less work, and a faster build time overall: just five seconds.
> Ultimately, none of this is Rust's checks as a language, this is burning away all of the abstractions into lean, mean code.
Right - my understanding is that Rust generates very large MIR and also LLVM IR because the "zero-cost abstractions" aren't yet zero-cost, and compiling them into zero-cost machine code is inherently expensive.
So it's not the safety checks per se, but it's things like "there are three wrapper objects here with various generic parameters where C/C++ would have started with just a pointer from the beginning". Those three wrapper objects couldn't have existed without the safety checks, and rustc very quickly identified that everything is in order, but then it monomorphized the generics into lots of tiny functions and LLVM has to optimize all of that into machine code that resembles what the (unsafe) pointer approach would have generated. (Which explains why it's not time taken in rustc proper, but also why we don't see LLVM being equally slow when compiling C/C++.)
I might split out C++ from C here though; template-heavy C++ is known to be pretty slow to compile too, for what's basically similar reasons: you end up generating a bunch of code that needs to be crunched down.
There's also some possibly deeper reasons around heavy use of generics: if your code touches something that is generic, and you change the type, it's going to need to recompile that function too. And then anything generic it touches. Etc. Trait objects can help compile times for this reason, but they aren't considered idiomatic, so most Rust code tends to stress the compiler in this way.
Most Java compilers are quick because they compile down bytecode, leaving virtually all optimization to the JIT.
AOT Java compilers certainly exist, but I don't recall Microsoft having one -- I fully admit my recollection of their Java dev environment is quite foggy at this point, though.
Regardless, the Java AOT compilers I've tried didn't seem particularly fast in relation to their level of optimization.
Is there some algorithm in Rust which has a necessary big-O? There were compilers in the 90s which ran a million lines per second (on much slower computers). Incremental compilation feels like working around the problem rather than solving it, and I have to imagine the complexity and maintenance is much worse.
I'm sure some of the LLVM optimization passes are expensive, but those are used in clang++ too, and it's not terribly slow.