Hacker News new | past | comments | ask | show | jobs | submit login
Reducing Rust Incremental Compilation Times on macOS (jakedeichert.com)
186 points by yannikyeo on April 18, 2021 | hide | past | favorite | 66 comments



Compile times in rustc have been steadily improving with time, as shown here - https://arewefastyet.rs.

Every release doesn't make every workload faster, but over a long time horizon, the effect is clear. Rust 1.34 was released in April 2019 and since then many crates have become 33-50% faster to compile, depending on the hardware and the compiler mode (clean/incremental, check/debug/release).

Interestingly, the speedup mentioned in OP won't show up in these charts because that's a change on macOS and these benchmarks were recorded on Linux.

What is expected to be a gamechanger is the release of cranelift in 2021 or 2022. It's an alternate debug backend that promises much faster debug builds.


Are long compile times just a product of Rust’ design? Or is there an issue with the compiler. Based on my experience with rust I would say it’s the former. It’s doing so many more checks for you then other compilers to prevent lots of problems that it’s bound to be slow.

I appreciate that they continue to speed up compilation/add flags. If the compiler gets slower every release it will eventually become unusable. I think the long compilation times will continue to get worse in the future as the rust compiler gets better at checking for errors, and I’m not sure what can really be done about that.

One thing that irks me is that they throw around the phrase zero cost abstraction a lot. And I just don’t buy it. Sure I get it, lots of things can be monomorphized so we don’t pay a runtime cost, but there is still a large compilation hit to do that, and it’s not zero cost to the programmer. One need only to look at Chandler Carruth’s cppcon talk “there are no zero cost abstractions” to understand that.


> It’s doing so many more checks for you then other compilers to prevent lots of problems that it’s bound to be slow.

The checks aren't the longest part of compile times, so that's not really it.

It is true that there are aspects of the design that lead to slower compile times than other languages, but it's more of what you're talking about at the end: monomorphization is great for a lack of runtime speed, but increases compile times. There's a lot more complexity here than simply "rust's safety checks are slow."

(And, "zero cost abstractions" was always speaking about runtime cost.)


If monomorphization is the killer, has the rust team considered adding a flag to compile all generics as trait objects in debug mode? Maybe there are some reasons this is impossible I haven’t thought of.


People have asked, but only some traits are object safe, for example. Maybe someone could come up with a way to make it work, but nobody has attempted to put in that work, and it would be much more complex than "oh just make this flag run the "trait object" codegen instead of the "monomorphization" codegen."

Also, in general, we don't like flags that change language semantics. This one might be acceptable though, but I'm not on the lang/compiler teams.


Why would it change language semantics? Haskell’s polymorphism to my eye seems about the same as rust’s, and it doesn’t need to worry about object safety


Haskell's polymorphism isn't the same as rust. Rust has two kinds of polymorphism.

Static dispatch:

  fn show<T>(a: T) -> String
  where T: Display
  {
    Display::fmt(a)
  }

  fn main() {
    show(1_i32);
    show(1.0_f32);
  }
is (handwaves) first translated to a version with no generics

  fn show_i32(a: i32) -> String {
    i32::fmt(a)
  }

  fn show_f32(a: f32) -> String {
    f32::fmt(a)
  }

  fn main() {
    show_i32(1_i32);
    show_f32(1.0_f32);
  }
You can probably see how the more permutations of functions you need to generate the slower it gets.

Dynamic dispatch:

  fn show(a: Box<dyn Display>) -> String {
    Display::fmt(a)
  }

  fn main() {
    show(1_i32);
    show(1.0_f32);
  }
This translates to (handwaves):

  fn show(a: Box<dyn Display>) -> String {
    let fmt_fn = lookup_fn(Display::fmt, typeid_of(a))
    fmt_fn(a)
  }

  fn main() {
    show(1_i32);
    show(1.0_f32);
  }
This is simple to compile, but you need to do some extra work at runtime.


That's not an answer though. I believe (but I know little of both Haskell and Rust) that Haskell polymorphism is at least as powerful as rust static polymorphism but it is normally implemented via type erasure.

I think the real reason is that rust gives additional guarantees about lack of allocation and object layout that are hard to implement with dictionary passing (BitC tried and failed).


I agree somewhat, in Haskell you can write generic functions without traits, and I think they are not monomorphized. (map for example.)

However, I think these all operate on boxed objects, which is a little bit easier to do than for Rust, where only some objects are boxed.

In short, I agree.


I mean, object safety is one exact example. Every trait can be monomorphized, but not all traits can be made into an object. So "turn every generic into an object" just doesn't work. When you request a generic bound, you're saying that you want certain performance characteristics.

Haskell's polymorphism is similar, it's true, but I can't speak to their semantics well enough to intelligently comment. They are much less concerned with performance than Rust is, so they tend to do less of the sorts of optimizations that gets Rust into trouble here.


But that's just playing with definitions. Rephrase it as "zero runtime cost abstraction" and you get back to the intended meaning.

Incidentally your first question is one that's very amenable to testing. When does the Rust compiler spend most of it's time? Is it at the checking stage?


> When does the Rust compiler spend most of it's time? Is it at the checking stage?

rustc has a self-profiler that can be used to answer this question [0], as well as a mode that times compiler passes [1].

There's no single reason the Rust compiler is slow, as it depends quite heavily on the code being compiled. For some codebases, LLVM code takes up most of the time; in other codebases (e.g., extremely generic-heavy codebases), it'll be checking-related passes.

[0]: https://github.com/rust-lang/measureme/blob/master/summarize...

[1]: https://wiki.alopex.li/WhereRustcSpendsItsTime


Oh come on now, do you and the parent really have to be those guys who go "well technically they're not zero cost because the programmer has to wait"?

Yes, you're technically correct, but also being unreasonable.

It was pretty clear that "zero cost abstractions" refers to the runtime aspect. That is the default meaning, there's nothing to "get back to".


Huh? That was exactly my point.


there are some rust programs where the most time is spent in llvm, others hit slow paths in trait bounds analysis, others hit other spots. there is no single hotpath


It isn’t even zero runtime cost abstraction though. Monomorphizing all generic functions increases code size. Increased code size causes more instruction cache pressure which may actually slow things down, depending on the application. I could imagine if you’ve got some very large functions that are called on a wide variety of objects then every call could blow out the Icache and kill the hit rate.


It's an abstraction over writing out the monomorphized version by hand. Zero-cost doesn't mean using the abstraction gets you the fastest possible implementation of whatever you're doing.


No, type checking isn't the whole story. I think most people vastly overestimate how complex those algorithms actually are; you can have fast type checking for a lot of languages that seem really advanced with very expressive features. OCaml is a good example; the compiler is extraordinarily fast despite being a much more high level and powerful language than a lot of others. It's got good code generation too. In the case of Rust, monomorphization is a bigger contributor.

Making a compiler fast is largely a goal you have to continuously strive for, and for the most part doesn't come for free, no matter the language. Saying "it's an issue with the compiler" makes it sound like the Rust team consciously made some horrible design decisions or something, but that isn't evidently clear from anything we can immediately observe.

The story I suspect is correct and vastly, vastly more boring is they probably focused on "the language" (features, APIs, etc) for a really long time and only started focusing on compiler performance in more recent memory, once users started getting more irritated, so it ran away from them a lot. That seems to happen to every compiler these days; features are what get monetary and mindshare support from users. Down-in-the-dirt .5% performance wins day in and day out, which you have to do a lot of to actually improve things most times, normally aren't fun, nor something people trot out the red carpet for.


Having repeatedly benchmarked my own rust builds I've found that the majority of time is spent linking. There are promising projects to improve that by an order of magnitude.

> I think the long compilation times will continue to get worse

No, it's more likely they'll get better - that has been the trend, and as I mentioned there are projects in the works that can likely cut the compile times in half.


"Zero cost abstractions" is just fancy-speak for "exploiting optimization passes" to convert dumb code generated by a template- or macro-system into machine code that would be generated from "less dumb code".

You get the same "zero cost abstraction" effect in lower-level languages like C (but you need to feed it "dumb code" to see a similar effect). Nobody talks about "zero cost abstractions" in the context of C programming because C doesn't encourage to "obfuscate" source code with high level abstractions.

Not sure if optimizer passes are the main reason for Rust being slow though, I'd guess it's more the static code analysis, e.g. a C/C++ compiler with static code analysis enabled is also many times slower than regular compilation (but Rust should have an advantage there, because Rust doesn't allow as much freedom as "less correct" languages, so the static analysis can be more focused).


Except you can't build a lot of those same abstractions in C, at least not in a way that doesn't involve throwing void pointers everywhere. Also the compiler has no idea what it's looking at so it can't optimize it as well because of pointer aliasing rules. So no, you can't build the same abstractions in C as cheaply as you can in Rust.


That assumes that "abstractions" are a good thing in the first place though. I haven't seen much evidence for this assumption in real world code so far (in the context of "understanding what this piece of code actually does"). I've mainly been exposed to C++ code though (where "over-abstraction" is quite common), but I also haven't seen much Rust code so far that I would consider particularly "straightforward" and/or "readable". YMMV of course.


Rust heavily relies on such things for safety checks, without loss of speed. We could get the same safety with runtime checks, but that would compromise other goals.


C is an an abstraction over assembly that is an abstraction over microcode and so on and on.

So C programmers must agree that some abstraction is good and it is at least plausible that C is not the best abstraction layer for evry single application.


If abstractions were bad ideas we wouldn't have compilers and we'd handcode in assembly. We also wouldn't have the hardware continue to be pretending to be a PDP-8 machine even though it's completely reordering the instructions that it runs and predicting every branch taken.

Our world is full of abstractions in basically every single thing we use.

Also to be clear, I mean "abstractions" not "obfuscation" which many people (common in Java and C++) call "abstraction" even though it is nothing of the sort.


It isn't "fancy speak" because those abstractions - language features - are intentionally so as not to have a non-zero cost implementation. It isn't an optimization pass and the programmer doesn't need to worry about a degenerate case causing it to fall through to a less optimized implementation.


> Are long compile times just a product of Rust’ design? Or is there an issue with the compiler.

Actually rustc has many modern features that say C++ compilers don't. For example, clang is not an incremental compiler, nor does it do parallel codegen. Rust also has had work on multithreading support. These features were introduced all after rustc already existed. It would have been unimaginably hard to retrofit clang to this, as C++ is way harder to refactor than Rust.

So I'd put it onto the design instead of the compiler. But it's not the safety features of Rust that cause the main slowdown, at least not directly. Yes, borrow checking is some additional step rustc has to do, and NLL introduction took a major engineering project to not regress compiler performance, but overall these safety checks don't make up such a large part of the compile process.

First, Rust has larger compile units than C. If you change one file in Rust, the entire crate has to be recompiled, while in C only the single file needs recompilation (unless it's a header, then everything that imports it needs recompilation). That's also why incremental compilation is much more important for Rust than it is for C (and it increasingly becomes important for modern C++ as compile units increase).

Second, in Rust you compile everything, including your dependencies. It's not like in C or C++ where you install *-dev packages for the heavy libraries. In fact, if you compiled everything yourself in the C/C++ world, often you'd get similar compile times or even worse ones. Rust has an unstable library format. It's literally just mmap'ed internal data structures. Two different versions of the compiler can't reuse the same library, after a compiler update you have to recompile everything from scratch, and if you want to be able to add new dependencies or cargo update, you need to always use the latest compiler, which gets released quite often. So many projects that I maybe touch once every 6 months I basically have to recompile from scratch again. This is all due to design and policy questions, not because the compiler itself is slow.

Third, in Rust you statically link everything. This puts greater load onto the linker which now has to copy large amounts of data to obtain large binaries. There is no stable ABI so you can't create a dynamically linked library with a safe interface (you can of course create one with an unsafe C interface but that's not nice). There is also no cargo support for it.

Fourth, heavy use of monomorphization. A lot of libraries in Rust are generic either on lifetimes or on types. Lifetimes can be stripped, but if your code is dependent on a type it gets copied and then recompiled for every different type. This is a design question and has benefits in the final program as the code can be optimized for the type. But it has to be optimized. This incurs a compile time cost.


"Safe" and "unsafe" is quite orthogonal to the ABI stability issue. Whether a library call is "safe" has to do with on what constraints have been placed on the library code, not what ABI is used.


Rust supports a stable but unsafe ABI, the C ABI (extern "C"). Rust also supports a safe but unstable ABI, the Rust ABI (extern "Rust"). But there is no ABI that Rust supports that is both safe and stable. Full list of supported ABIs: https://doc.rust-lang.org/reference/items/external-blocks.ht...


Except Visual C++ and C++ Builder, among others, do have all those features you say C++ doesn't have by focusing on clang.

C++ is not one trick pony, it is an ISO standard with multiple implementations.


Visual C++ to my knowledge does not support incremental or parallel compilation within the compile unit. It does support incremental linking though, and parallel compilation of multiple compile units that don't depend on each other. Features which both clang and rustc have btw (well clang only concerns one compile unit, so the parallelism depends on how you call clang, but many clang calling build systems support parallelism).

No idea about C++ Builder. Do you have links to documentation to back it up? I'm curious :).


You didn't mentioned it was within the same compile unit, just parallel code generation.

In any case, have a look here,

https://docs.microsoft.com/en-us/cpp/build/reference/cgthrea...


Most C++ compilers support precompiled headers, which would be the closest equivalent to incremental compilation within a single compilation unit.


It also a mix of compiler issue and the approach of compiling everything from scratch.

A debug backend or interpreted based one, e.g. ocaml, GHC hugi, could help quite substantially.


> and the approach of compiling everything from scratch.

That is not really true. Yes, it would help first compiles a bit, but only the initial compiles. That's a one-time improvement, that while important, doesn't solve the big problems.


You keep downplaying this, however this is a problem I never have with other compiled languages, which don't force developers to buy a new computer just to make it barebable.

With Rust is almost impossible, just to check a random project without investing the required time to wait for its build to finish.

Also maybe I am special, but it is quite typical to switch project branches quite often, or just get libraries from another project.

All of which I can easily do via binary libraries stagging.


Again, the issue of compile times is more complex than "just do this one thing and it's fixed."

After you first cargo build, you have all of the dependencies pre-compiled. Every single "cargo build" after that's compile time would be completely unaffected by pre-built dependencies. You could try sccache; builds are still slow. Most people care about those secondary builds, not about the initial build, so telling them that this solves their problem is simply not true. It's not downplaying it, as I said, the first builds are also important. Unfortunately, it's just not that simple.

If solving Rust builds were as easy as this, we'd have just done it long ago. We care tremendously about improving compile times. It is a constant request on surveys. We put a lot of engineering work into this.


I just use Rust for toy projects, my work is all around Java, .NET and C++, so my feedback is more from the language geek point of view that would like to see Rust gain more adoption on the kind of workflows I use those languages.

So while it may seem like bashing, it is more trying to be a positive critic of something that real matters and is a show stopper for some environments.

In any case I look forward to any improvement in build times, and an eventual story regarding binary libraries.


I don't think it's bashing, I just don't think that it's going to be anything more than one small part of an overall improvement strategy, and isn't the most important one.


Good to know that you don't take it the bad way. All the best for Rust efforts.


If you work on multiple branches you could use git worktrees


(Also, the only time multiple branches cause issues is if the two branches have different dependencies, and again, only on the first compile. Both the source and the compiled artifacts don't live in source control, and so swapping branches doesn't trigger a re-build of them.)


There is work on integrating Cranelift into rustc (https://github.com/bjorn3/rustc_codegen_cranelift) so that rustc can compile to bytecode for the Cranelift JIT.


I am aware of it, on my toy projects it saved 5 minutes out of 20 in a clean build.


We also should not forget how many years and engineering hours were spent on C++ compilers.

Anybody know of a 2000 vs 2020 C++ compilation comparison?


You can also turn off debuginfo completely. Personally, someone who does printf debugging, I mainly need it to debug segfaults, which are really rare in Rust. Sometimes the call stack of a panic is useful as well, but if I need debuginfo I can just re-enable it.

https://github.com/est31/cargo-udeps/commit/e550d93c7a6d756e...


Author here. I just tried disabling debug info (debug = 0) for the first time and it looks like my recompile times shave off another 500-1000ms which isn't bad!


Glad to have been of help!


> Dev rebuilds lasting 14 seconds was getting me a bit worried

I wish my dev builds only took 14 seconds...


Author here. How long do your dev builds take, and what is it that you are building? Curious what kind of improvement you get with split-debuginfo enabled!

This project is a very small hyper/tokio backend API. However, I also work on a CLI tool [1] I created which has very fast recompile times, usually less than a second. Mostly because it's pretty light on dependencies.

[1]: https://github.com/jakedeichert/mask


In C++ land I’ve experienced the gamut from multi-minute clean builds to 30 minutes to 1.5 hours on the latest & greatest CPUs, even with ccache. 14s sounds like paradise to me.


Honest question, is there really no way to know whether what you wrote is working or not without waiting 20min-1H?? That sounds like hell.

Where these incremental compilation?


That’s usually clean builds. Incremental builds can be in the instant to 1-5 min depending on what line of code you change (deploy a new compiler and you’re probably rebuilding everything).


Oh ok, that makes much more sense.


It’s a C++ project for an embedded device, so my comment was more a reflection on build times in general rather than being Rust related.


On the other hand I just gave up expecting sscache to build on my travel netbook, > 400 dependencies, really?

apt-get install rustlibXYZ-dev can't come soon enough.


sscache has pre-built binaries on their releases page. Or you can do:

    cargo install --no-default-features 
To avoid building support for all storage backends. You can then add back any backend you're actually using.


Thanks, I failed to find how to download them, besides the note that a login is expected, I could not find the artifacts among GitHub workflows.

However, the point is actually how it is a big hurdle to overcome having to compile crates with npm like dependency trees all from scratch, vs other compiled languages integrated into the distribution package repositories.


I'm not sure why Rust doesn't emit the debug info in the Mach-O. The executable can still be moved to another system and debugged but it doesn't require a second pass to generate a separate dSYM.


The system linker strips debug info from the executable. The way the debugger finds debug info is that the debug info is available in the object files. A reference to the object files is added to the executable. The debugger can find the object files and read the debug info from the object files. I have no experience with Rust but in opinion this should be the default behavior for debug builds, no need to generate a dSYM file. The dSYM file is used if you want to ship debug info with your final release build to customers. There's no need for dSYM during regular development workflows.

It's possible to get the linker to keep the debug info by tweaking the section attributes. By default all debug info related sections have the `S_ATTR_DEBUG` flag. If that is replaced with the `S_REGULAR` flag the linker will keep those sections. The DMD D compiler does this for the `__debug_line` section [1][2][3]. This allows for uncaught exceptions to print a stack trace with filenames and line numbers. Of course, DMD uses a custom backend so this change was easy. Rust which relies on LLVM would probably need a fork.

[1] https://github.com/dlang/dmd/pull/8168 [2] https://github.com/dlang/dmd/blob/33406c205b76a8c2b5fb918da1... [3] https://github.com/dlang/dmd/blob/33406c205b76a8c2b5fb918da1...


> There's no need for dSYM during regular development workflows.

I'm not sure that's true. I was not getting line numbers in backtraces in a Rust program I developed because I copied the executable to another directory. I had to add a symlink to the dSYM directory to make it work.


Didn't know dmd did that.


The dsym is basically just a macho anyways. It used to just emit it as part of the main binary and then asked dsymutil to split it off.

If you do want to retain it entirely you can set `split-debuginfo` to `off` and it will remain in the final executable.

The advantage of the `unpacked` version is that the debug info just stays associated with the original object files instead of the executable. This makes handling them harder obviously but is good enough for a lot of use cases.


Why is this not the default? What’s the downside?


I think because split-debuginfo is recently stabilized (Nov 2020).

https://github.com/rust-lang/rust/pull/79570

Maybe the default will be switched in the future.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: