Cranelift, Part 4: A New Register Allocator

xiaodai · on June 10, 2022

Is Cranelift basically a LLVM but in Rust?

cfallin · on June 10, 2022

Cranelift is a general-purpose compiler like LLVM is, but its goals are different: it is targeted toward applications like JITs where faster compilation is important. It is intended to be a peer of browser JITs' optimizing tiers. In our README [0] we link some results where we're ~14% slower than LLVM but with ~10x faster compilation.

We also have an explicit focus on correctness, simplicity, and verification. One could argue that in practice LLVM is used everywhere and has dozens of active core contributors, that bugs and missed optimizations are shallow at that scale, and it's hard to compete with that; and there is some merit in that... but our codebase is two orders of magnitude smaller, and we're actively engaging with academics and designing things -- our lowering DSL, our regalloc's symbolic verifier, our fuzzing-first approach -- to get the most mileage we can out of our efforts. It seems to be working OK so far!

[0] https://github.com/bytecodealliance/wasmtime/blob/main/crane...

tialaramex · on June 10, 2022

> explicit focus on correctness, simplicity and verification

Does this mean Cranelift has a clearer view of what its IR really means than, say, LLVM? It seems to me that being very clear-eyed on this will be important as C and C++ get ready to bite the bullet (perhaps this decade) and formally document how pointer provenance works in their languages, and perhaps Aria's provenance "experiment" in Rust begins the journey to stabilisation.

cfallin · on June 10, 2022

That's the goal at least! We explicitly do not have a notion of "undef" or "poison" values in our IR, and to the largest degree possible we want determinism (modulo e.g. some NaN-related stuff right now). Our current discussion with some researchers wanting to formally verify our lowerings will likely push us toward the start of some formal spec for the IR as well, though exactly how that will work or be maintained is not yet decided.

I'm not super-familiar with the pointer provenance work in Rust but I'll read more about this; thanks for the mention!

nindalf · on June 10, 2022

How far do you think the effort to integrate cranelift as an optional backend to rustc is from shipping?

cfallin · on June 10, 2022

That's a good question for bjorn3, the leader of that effort; they periodically publish status updates. Over on the Reddit discussion someone pointed to this GitHub issue too: https://github.com/rust-lang/rust/pull/81746#issuecomment-10...

sitkack · on June 10, 2022

Such a wonderful series of articles! Beautiful work.

userbinator · on June 10, 2022

Register allocation is a classically hard (NP-hard!) problem

I thought one of the huge advantages of using SSA is linear-time register allocation?

aseipp · on June 10, 2022

No. First, most compilers actually abandon SSA by the time they begin register allocation through phi elimination, and once you do this you have given up that property. Beyond that, coloring a perfect graph (an SSA graph) can be done in quadratic time, but coloring is not the most interesting part of the process, since you almost always have to insert spills which modifies the interference graph. Register allocation is more than just coloring. On top of that, most ISAs have complications like specific registers needing to be pinned for specific operations, or register aliasing on x86-64, that make things hairier.

In practice, theoretical optimality matters far less than the actual heuristics you use (e.g. avoid spills inside of a loop.)

sitkack · on June 10, 2022

Whenever one sees "heuristic", I think "Bitter Lesson" [1] and how can we apply brute force computation to the problem.

It would seem like there is still ample space to study the application of neural networks to register allocation. [2] They already gave pretty good results to branch prediction. [3]

What are your thoughts on the application of NN for RA and how would you structure the training set?

[1] http://incompleteideas.net/IncIdeas/BitterLesson.html

[2] https://www.semanticscholar.org/paper/Real-time-physical-reg... I haven't read the paper, just found in a quick search.

[3] https://cseweb.ucsd.edu/~atsmith/rnn_branch.pdf

aseipp · on June 10, 2022

The second paper you are referencing is not quite what you think it is. It is the application of a neural network to the processor itself, to assist in performing physical register renaming, at runtime, in order to achieve out-of-order execution. This is a very different problem than a compiler's register allocator (for one you're dealing with hundreds of potential registers vs like, 16-30.)

On that topic, NN for branch prediction are, in some sense, nothing new; perceptron and perceptron-based designs have existed for at least 20 years. But I'm not aware of anything concrete or specific in current BP designs as of recently (beyond marketing hype) but I haven't kept up with it; maybe some variation of TAGE with model-assistance is out there, but I'm not sure.

I do not know what a training set or neural network model for performing register allocation on real-world programs would look like at this moment.

sitkack · on June 11, 2022

The second paper is more of an existence proof that folks are thinking about the problem. From a high level, register file selection and register renaming is similar problem to register allocation in a compiler. The paper itself is dealing with hundreds of virtual registers and how to map them to renamer.

TAGE is what I was thinking of [1,2]. Thanks for the reminder.

I found some relevant research in neural register allocation, "2020 LLVM in HPC Workshop: Deep Learning-based Approx. Graph-Coloring for Register Allocation"

https://www.youtube.com/watch?v=4FW7iznzIoE

There is also some nascent work on applying neural techniques to constraint optimization problems.

[1] https://www.semanticscholar.org/paper/A-case-for-(partially)...

[2] https://www.semanticscholar.org/search?q=TAGE%20branch%20pre...

cfallin · on June 10, 2022

One can do optimal regalloc over SSA if one doesn't have to spill, or split to make the spilling better, but spilling and splitting are the most interesting and relevant part of practical register allocation -- so in practice this isn't quite true.

wrs · on June 9, 2022

Next time someone says "well-written code doesn't need comments" I think I will just refer them to this page.

gpm · on June 10, 2022

I'll doubt they'll find it to be a convincing argument, because it relies on a fairly uncharitable interpretation of the claim. Rather than interpreting it as "literally no well-written code needs comments" I would interpret it as "well-written code doing straightforward things (like most code is doing) doesn't need comments". Complex algorithms trying to quickly approximate a np-hard search problem in a complicated environment are a fairly obvious exception to the rule, even if you believe the rule is almost always applicable.

I'm also not sure when the last time I heard someone seriously make this claim was...

nerpderp82 · on June 9, 2022

We all know that is just an excuse for no comments. I haven't heard anyone say that in over 5 years.