Mir: A lightweight JIT compiler project

pizlonator · on Jan 20, 2020

This project is trying to do too many things.

It seems like Ruby needs a profile guided optimizer, which means building an IR that is suitable for profile-guided optimization. That’s way different from classic IRs like this since it means having provisions for OSR exit.

I recommend looking at these slides to learn how to do it.

http://www.filpizlo.com/slides/pizlo-speculation-in-jsc-slid... http://www.filpizlo.com/slides/pizlo-splash2018-jsc-compiler...

chrisseaton · on Jan 20, 2020

I agree - I think MIR is trying to produce better machine code more quickly, but that isn't the problem Ruby faces! The problem Ruby faces is having to inline through ten levels of metaprogramming in order to get any meaningfully-sized compilation unit that you can optimise, and only then it makes sense to worry about code generation.

However, the reason that this is not so simple is that Ruby is a vastly more complicated language than JavaScript (I've worked on implementing both.) Ruby has an enormous standard library and most Ruby programs are just endless calls to the that library, so your compiler must be able to understand the library semantically, either by rewriting it in Ruby (not likely at scale in MRI) or by adding tens of thousands of individually optimised intrinsics (again not likely).

Ruby is digging itself further into a local optima with these approaches optimisations, rather than looking further around for a better global optima.

pizlonator · on Jan 20, 2020

Your description of optimizing Ruby sounds like it is exactly like optimizing JS. JSC’s main optimizing IR (the DFG) is all about understanding the standard library semantically. We write the standard library in JS+hacks (called “builtins”) or C++ (pick on a per function basis). Some of those functions have opcodes in DFG, or are built out of primitives that have their own specialized opcodes in DFG.

Maybe the reason why Ruby optimization has problems is nobody has done it the JSC way.

chrisseaton · on Jan 20, 2020

> Your description of optimizing Ruby sounds like it is exactly like optimizing JS.

Yes it's the same problem and it's unique to neither Ruby nor JS - but the problem is just scale.

Ruby has a larger library, so needs more rewritten from the current C into Ruby+builtins. And then that rewrite doesn't maintain Ruby semantics (the C API currently used does not exactly match Ruby semantics) so a rewrite matching semantics is often very hard.

> Maybe the reason why Ruby optimization has problems is nobody has done it the JSC way.

People have been trying the approach you use in Core (for over a decade, starting with Rubinius, now TruffleRuby and others), but I think (based on practical experience working on optimising both languages) that it's just a larger problem in Ruby which is why it hasn't been conquered yet.

pizlonator · on Jan 20, 2020

I think that the way you’re describing the solution for Ruby tells me that you don’t see the problem the way that I see it. The problem isn’t rewriting things in builtins. The problem is making an IR in which you can reason about the standard library at scale: reason about its speculation opportunities, all of the implied dependencies and effects, it’s GC impacts, etc.

None of the IRs I’ve seen people try for Ruby does that. JSC’s DFG IR does a lot of this. Hence I don’t think folks have really tried the JSC approach for Ruby.

pizlonator · on Jan 20, 2020

Forgot to mention: baseline JIT with PICs. I keep hearing about sophisticated top tier JITs but that won’t do shit if you don’t have a solid baseline JIT.

jashmatthews · on Jan 20, 2020

> The problem is making an IR in which you can reason about the standard library at scale

How do you deal with functions written in C++ which invoke functions written in JS?

pizlonator · on Jan 20, 2020

The important ones are intrinsics. The C++ function is understood by the JIT so well that it just emits the code for it itself.

Our C++->JS calling convention sucks, partly because we just avoid going down that path.

We do have a C++->JS call IC that we could use more.

chrisseaton · on Jan 20, 2020

> The C++ function is understood by the JIT so well that it just emits the code for it itself.

But how would that work for Ruby, where C functions on the critical performance path are often third-party code that the compiler author has never seen before?

We'd need to let the JIT understand third-party unseen C functions. There are experiments to do that (Sulong, MIR, Rubinius sort of tried it) but I think it's more of an open problem than you're implying.

If you treat calls to unknown third-party C functions as an opaque native call then you're really going to struggle to build a meaningful compilation unit, in my experience.

tln · on Jan 20, 2020

It looks like the author is intending to use a C to MIR compiler so that existing CRuby code can be inlined into generated MIR code. Third party code could be compiled the same way, right?

"The blue parts show the new data-flow for MJIT. When building CRuby, we could generate MIR code for the standard Ruby methods written in C. We can load this MIR code as a MIR binary. This part could be done very quickly."

chrisseaton · on Jan 20, 2020

That's what MIR is doing, not what JavaScriptCore is doing.

Flip thinks this isn't needed for Ruby - '[t]his project is trying to do too many things' - and that JavaScriptCore could do it already.

I think that MIR and TruffleRuby think they have to do something else (lifting C code into their IR) shows us that JavaScriptCore's approach isn't quite as immediately applicable as Filip thinks it is.

pizlonator · on Jan 20, 2020

It’s more that I think that you won’t get enough semantic understanding of C extensions lifted into any IR for that to be useful.

pizlonator · on Jan 21, 2020

I buy that third party native code is critical path.

I buy that native code often calls back to Ruby.

I get why you would assume that therefore you need to make it fast for third party native code to call into Ruby. But that’s not how you want to think to succeed at VM optimizations.

You can make native code fast. It’s probably already about as fast as it’s going to be. Design a VM that makes it continue to be fast. Truffle won’t give you that since it will have to DBT the native code to make it fast (and that’s the good case).

I would bet you that first party native code that calls into Ruby is by far the most common kind of yield invocation. Like Array#each/map and equivalents for Hash. You want to treat those specially for two reasons:

- their fastest path for baseline code if done the way I describe is faster than any alternative. Baseline isn’t going to have a chance to inline arbitrary functions.

- they are likely to make up a large fraction of cases where native calls back to a Ruby.

For third parties, there’s a future where someone just exposes the JITing API that JSC gives to the DOM.

brrt · on Jan 21, 2020

LuaJIT is an example of what happens if you do this right :-)

jashmatthews · on Jan 21, 2020

I’m building a tracing JIT for CRuby based on some ideas from LuaJIT, including trace stitching.

One major issue is the optimization scope is limited by how root traces are formed. This is fine in Lua but a much bigger issue for Ruby.

chrisseaton · on Jan 20, 2020

Do you not think the Truffle and Graal IRs do this?

pizlonator · on Jan 20, 2020

No, they don’t do this.

They try to infer specialization from an AST interpreter. That’s not on the same planet as what I’m talking about.

chrisseaton · on Jan 20, 2020

Well I'm not sure what you think you're referring to then.

Do you mean an IR where all core library routines are first-class citizens, with their own nodes and information about their semantics encoded so that the compiler can reason about them?

pizlonator · on Jan 20, 2020

Yes, that is what we do in JSC.

With the caveat that core routines that are very cool don’t get included no matter how simple they are and warm/hot ones get opcodes even if it’s very annoying to do it.

chrisseaton · on Jan 20, 2020

Right, so I get that, and I think so do other people in the small Ruby compiler community. It's just a much larger problem in Ruby due to the size and complexity of the standard library and the presence of more non-local effects (like accessing dynamic caller frames is often required). It's possibly so large that it's not reasonable to do it like that and people are looking for other options, like Truffle's PE. That's why it hasn't been achieved yet - it's not because anyone is unaware of common VM techniques.

Another big problem JavaScript doesn't have is that user C extensions are very often on the critical path in Ruby, so you must be able to optimise through them somehow. I don't think JavaScriptCore has any solutions for that we should be trying?

pizlonator · on Jan 20, 2020

Chris, JSC’s IRs have extensive support for accessing non local callframes and doing a great job on effects generally. JS has _lots_ of this kind of nonsense in real code and the language supports it at least as much as Ruby does (like in super evils ways, like function.arguments).

JSC has extensive support for fast calls into native code because of the DOM. For example we have the Snippet JIT that allows the DOM to turn hot functions into almost first class compiler ops. Like, callbacks in the DOM can dictate codegen and effect analysis.

Josh, the canonical JSC approach to the iterator callback problem wouldn’t be to use builtins. That wouldn’t necessarily achieve great perf. It’s certainly not great for the baseline tier. I think the only good option is to make those iterators (like Array#each) be intrinsic as fuck: all call ICs can ask for inline machine code generation of the loop along with the call back to the passed in block. You could imagine this enabling inlining of the loop and its body even in the baseline JIT.

I get that people probably see the builtins in JSC and start having wild fantasies about what this can achieve. In reality for things where perf matters, you deploy ICs and custom template codegen.

jashmatthews · on Jan 20, 2020

I think the problem with the intrinsics approach is that lots of this code isn't stdlib but extensions like: https://github.com/ged/ruby-pg/blob/6853309b64852755daed3e9c...

I don't really see a good solution here other than lifting the control flow to Ruby with specific bytecode ops to minimize overhead and a good, typed FFI like LuaJIT has? Am I missing something?

pizlonator · on Jan 20, 2020

Could establish an interface for those extensions to participate in the JIT and ICs. JSC does that for the DOM.

Other than that just try to make calls into JITed code as fast as possible. But that needs to be the backup plan, with the main plan being that perf-sensitive extensions play with the JIT.

chrisseaton · on Jan 20, 2020

But we have no control over these C extensions. And there's a huge corpus of existing C extensions that need to keep working. 500 million lines of C and C++, in fact, even for just the publicly available code.

And by the way - there isn't really any proper interface at all! There's just basically the whole internals of the C Ruby implementation exposed to C extension authors. There's no abstraction with handles and things like that.

There was an effort at one point to get C extension authors to use the FFI instead, and a JNI-style API to permit a moving garbage collector was proposed, but these approaches weren't successful in gaining any momentum.

All in all... that's why we don't just do it the JavaScriptCore way. We have different constraints to yours.

pizlonator · on Jan 21, 2020

You don’t need to make all native extensions fast. You need to make some of the most common things fast and keep the rest of them working. It’s fine if not all extensions adopt a JIT interface.

The thing about moving GC is a red herring. Ruby wouldn’t benefit from it. JSC doesn’t use moving GC.

jashmatthews · on Jan 20, 2020

I think for CRuby to use the JSC approach we would need to translate methods written entirely in C which opaquely iterate over collections and invoke blocks of Ruby into Ruby which uses bytecodes to get an iterator and then iterate?

HALtheWise · on Jan 20, 2020

It looks like Mir's approach to this is to build a jit compiler that has access to the C source of the standard library and can inline it into the JITted code of Ruby functions, and vice versa. Building a compiler that can handle mixed-language input might be easier than rewriting the standard Library to all be in one language.

jashmatthews · on Jan 20, 2020

"inlining everything" is actually maybe not the best approach. It's been talked about a lot, but it's not always helpful. For example, C compilers (or LLVM) are unable to optimize boxing and then unboxing a double into a Ruby Flonum and back.

pizlonator · on Jan 20, 2020

Exactly. You need a compiler that can reason about Ruby semantics. Llvm can’t. Not sure MIR will do any better.

The key is:

- large opcode set and an architecture that tries to amortize the pain of lots of opcodes.

- excellent support for speculation and effects analysis.

beagle3 · on Jan 20, 2020

And maybe it is. GP, chrisseaton did that in Truffle/Graal and its the fastest ruby there is.

tomcam · on Jan 20, 2020

I don’t think CockroachJIT is taken...

(for the uninitiated, I kid. Sample of previous naming controversy at https://news.ycombinator.com/item?id=14309903)

bakery2k · on Jan 20, 2020

From GitHub [1]:

  "Plans to try MIR light-weight JIT first for CRuby or/and MRuby implementation"
  "MIR is strongly typed"

Is there an explanation of how the project bridges the gap between dynamically-typed Ruby and statically-typed MIR?

More generally, I'd love to see something like MRuby+MIR be successful. It would be great to see an alternative to the aging LuaJIT.

[1] https://github.com/vnmakarov/mir

vnorilo · on Jan 20, 2020

Seems to me that MIR operates on a (much) lower level, basically abstract away the physical machine and its finite register set. As such, it would replace LLVM (or GCC) middle and back end. The goal is much faster compilation without sacrificing more than ~20% of performance.

Dynamic types and garbage collection would then be implemented on/for the abstract MIR machine.

chrisseaton · on Jan 20, 2020

> Is there an explanation of how the project bridges the gap between dynamically-typed Ruby and statically-typed MIR?

My understanding is that MIR doesn’t tackle this problem, so the code does stay pretty dynamic when compiled, just like its sister project YARV MJIT.

They both need an intermediate profiling mode to specialise and monomorphise but nobody is building that as far as I know.

monocasa · on Jan 20, 2020

It's probably like how V8 works. Even though JS is dynamically typed, V8 will keep internal static type definitions around based on what it sees when running, and trap out internally to the slow path when the types don't match what it thinks should happen rather than throwing type errors.

nn3 · on Jan 20, 2020

Interesting. Seems to be the first modern non trivial compiler that avoids using SSA internally. I thought SSA had cornered the market, but perhaps now there is an opposite trend (ok one example doesn't make a trend)

<quote> No SSA (single static assignment form) for:

Faster optimizations for short optimizations pipeline and small functions (a target usage scenario) Currently SSA could be used only for two optimizations (CCP and GCSE). SSA usage would mean 4 additional passes over IR. If we implement more optimizations, SSA transition is possible when additional time for expensive in/out SSA passes will be less than additional time for non-SSA optimization implementation Simpler and more compact generator code because we can avoid to implement a lot of nontrivial code (for dominator and dominator frontier calculation, a good out of SSA code) </quote>

tom_mellior · on Jan 21, 2020

> Seems to be the first modern non trivial compiler that avoids using SSA internally.

Of course it's debatable what "modern" and "non trivial" mean, and if they apply to this project. It's very small, but on the (one!) small benchmark the author cites it seems to do quite well, so it's certainly not completely naive.

For whatever it's worth, CompCert doesn't use SSA either, and that's certainly a non-trivial compiler, though arguably the non-triviality does not stem from any advanced optimizations it does.

ksec · on Jan 20, 2020

There are quite a few decent active Ruby Implementation going on at the moment.

CRuby with MJIT

MIR ( This )

JRuby ( Ruby on JVM )

TruffleRuby ( Ruby on Graal )

Artichoke ( Ruby on Rust )

And I remember someone mentioned making Ruby with Tracing JIT. ( Not Topaz ) Unfortunately My Google fu is not good enough I can no longer find it.

boulos · on Jan 20, 2020

Rubinius. Here are Evan’s slides from the 2009 LLVM developer conference:

https://llvm.org/devmtg/2009-10/Phoenix_AcceleratingRuby.pdf

EngineYard took Rubinius in a few directions, but I think the main lasting impact was all the RSpec work they did along the way.

ksec · on Jan 20, 2020

Edited my original post. I meant Implementation that are still actively developed. Both Rubinius and Topaz are no longer being maintained.

YorickPeterse · on Jan 20, 2020

I believe Rubinius _is_ being maintained (again), it's just going in a direction very different from Ruby.

lukegru · on Jan 20, 2020

Yes, there are quite a few Ruby implementations but MIR is not one nor is it trying to be. It's a standalone library useful for use in a tier 1 JIT for any dynamic language, along with other uses outside of JITs having to do with native code generation. The author does have plans to integrate it into CRuby to work alongside MJIT.

mratsim · on Jan 20, 2020

I would be quite interested in an IR/JIT assembler specialized for vector instructions with ARM Neon and x86-64 SSE~AVX512 output.

Ideally it handles register allocations and generated function caching as well. The current JIT assembler (ASMJIT, Xbyak) requires you to handle register allocations. LLVM is as mentioned quite a heavy dependency to have.

Asm2D · on Jan 23, 2020

asmjit has a register allocator, for sure not the highest quality one, but it's there in the asmjit's Compiler infrastructure.

MaxBarraclough · on Jan 20, 2020

How about using GCC's NEON intrinsics from C?

MaxBarraclough · on Jan 20, 2020

The article stresses how their JIT is much more lightweight than GCC/LLVM, which is perfectly valid, but why not compare MIR against the other portable lightweight JIT engines out there? They're not the first to think of it.

The article mentions Cranelift, but that's a 'middleweight JIT' with a proper SSA IR. I was surprised to see LibJIT has more LOC than Cranelift - I thought it was lighter. (Imperfect proxy for runtime 'weight', of course.)

If you want a lightweight portable JIT engine, there's already GNU Lightning [0], and the atrociously-named Lightening fork [1] (used in the new JIT in the GNU Guile Scheme interpreter, which turned up on the HN front page recently).

Here's a 1996 paper (preprint) on a research JIT named VCODE which executed around 8 instructions to generate each instruction in its output. [2] (Sadly it was never released, as far as I can tell, and is presumably long dead.)

Anyway, with all that said, I wish this project well. No-one's managed to get good performance out of Ruby yet, so it's certainly ambitious. Google gave up on Unladen Swallow, and that was a JIT for Python, which, as I understand it, is more amenable to JIT than Ruby. Even failing that, having a quality rival to GNU Lightning would be worthwhile.

[0] https://www.gnu.org/software/lightning/

[1] https://www.wingolog.org/archives/2019/05/24/lightening-run-...

[2] http://www-leland.stanford.edu/class/cs343/resources/vcode-a...

scythe · on Jan 20, 2020

How is this meaningfully different in scope and intention vs Parrot? That project went on for a long time until every language (including its original target, Raku) decided they’d rather build their own more specialized JIT. What would prevent MIR from meeting the same fate?

jashmatthews · on Jan 20, 2020

MIR is much more flexible. It's like CraneLift but more basic.

Anything you write in MIR can be compiled by MIR. Parrot only JITs Parrot bytecode, so the applications are much more limited.

tom_mellior · on Jan 21, 2020

> implement the GCC C extensions necessary for the CRuby JIT implementation

That's the point where "C is nice and simple, it's easy to whip up a compiler" invariably turns into "why the #^(&)# didn't I use an existing frontend?". Real-world C code is messy. The entire sub-project of implementing a C compiler is a needless distraction that will turn into a huge time-suck with zero benefit to the author.

See also: "Why Says C is Simple?" https://people.eecs.berkeley.edu/~necula/cil/cil016.html

mark_l_watson · on Jan 20, 2020

Good analysis on memory usage and how that negates use in mobile and IoT projects.

Ruby was my go to language for a long dry spell when I had little Lisp development jobs (except for Clojure). Ruby is a great language, as Matz says, Ruby is designed for developer happiness. I stopped using Ruby when more Common Lisp work came my way and then I used Python for five years of deep learning work. I have favorite languages but I used what customers wanted.

That said, I still keep up with Ruby news.

kbumsik · on Jan 20, 2020

Hey Mir display server is still alive :)

DC-3 · on Jan 20, 2020

Yet another drastic pivot!

DominoTree · on Jan 20, 2020

Hey everyone, please stop naming things Mir for a while. Thanks!

giancarlostoro · on Jan 20, 2020

> https://en.wikipedia.org/wiki/Mir_(disambiguation)#Science_a...

I feel like this list is missing some more, but yeah, there's been a few 'Mir' projects.

jordigh · on Jan 20, 2020

I was thinking of this Mir:

http://docs.mir.dlang.io/latest/index.html

giancarlostoro · on Jan 20, 2020

I was thinking of Rust's MIR being missing too:

https://blog.rust-lang.org/2016/04/19/MIR.html

wyldfire · on Jan 20, 2020

> That is, we are introducing a new intermediate representation (IR) of your program that we call MIR: MIR stands for mid-level IR, because the MIR comes between the existing HIR ("high-level IR", roughly an abstract syntax tree) and LLVM (the "low-level" IR).

It's so terribly confusing because LLVM itself defines a .mir (Machine IR) that is a syntax between its IR and the backend (which could be thought of as mid-level IR). When I heard about Rust's MIR, I assumed it was some clever way to generate target-dependent IR.

The good news is that we now have M(L)IR: the one IR to bring them all and in the darkness bind them.

anp · on Jan 20, 2020

If anything I expect MLIR to make things more confusing as Rust’s MIR is probably going to be around for the long haul.

andersson42 · on Jan 20, 2020

I was thinking of the Soviet Spacestation Mir, probably will tell you my age...

https://en.wikipedia.org/wiki/Mir

crystaldev · on Jan 20, 2020

Everyone thinks of the Mir space station first, no?

catalogia · on Jan 20, 2020

Probably most people past their mid 20s, I don't know that many schools teach about it.

crystaldev · on Jan 20, 2020

That sounds right, anybody with memory of 2001 when its deorbit was in the news.

crystaldev · on Jan 20, 2020

I was thinking of this Mir:

https://mir-server.io/