Does anyone have experiences using MLIR vs. a custom IR which lowers to LLVM IR?...

mathisfun123 · on Aug 17, 2023

i've worked on pytorch and mlir. pytorch has no less than 3 internal IRs and has tensorexprs, which lower to llvm ir. without a doubt mlir is preferrable because at the minimum it's a bigger ecosystem of users/developers etc that understand the compiler. so the code is clearer, the docs are better, there are way more features and tools. i really can't name a negative relative to pytorch, especially now that the one cute thing that pytorch had (codegen for boilerplate for the ir) has been implemented (better) in mlir.

> Plus, it seems MLIR's extensibility was build to support some things in particular

this isn't true in the least. just go to https://mlir.llvm.org/docs/Dialects/ and you'll see the set of upstream dialects is very diverse. and that's just upstream.

subharmonicon · on Aug 17, 2023

It’s not clear that all these dialects are actually helping things, or that the design of something from which it’s easy to make new dialects is that helpful.

Why?

I attended an MLIR online meeting months back and there was someone presenting who had a diagram that literally had a dozen different dialects and converters that were being written between different ones (but not MxN, so it was a strictly relatively arbitrary set of combinations that were being converted between, and mostly one-directional if I recall correctly). This was the “solution” to being able to make various things interoperate, but from a distance it looks a lot like a relatively obvious problem caused by the very nature of the design of MLIR.

j2kun · on Aug 17, 2023

The main reason to make a new dialect is so that a specific type of optimization is easy to write in that dialect. For example, the affine dialect explicitly exists so that one can implement polyhedral loop optimizations. These were possible in, say, GCC, but it requires "raising" an arbitrary low-level program—full of loops with mixed in memory access, weird control flow, and even GOTO—to identify the structural patterns needed to then implement polyhedral optimizations. It took many engineers over many years, even re-designing the entire GCC IR (into something called "GIMPLE"), to enable this, and IIUC it's still considered one of the more challenging parts of the GCC codebase to deal with.

Then with MLIR you want to do ML optimizations, which require even _higher_ level constructions like identifying linear algebra ops, so you can do stuff like tiling optimizations and lower to dedicated hardware instructions. THe high level MLIR dialects enables that information to be preserved as long as possible so that it's easier to implement all the magic.

subharmonicon · on Aug 17, 2023

Yes, I completely understand the reason for the dialects. I’ve been working on compilers for decades, and what’s interesting about MLIR is that it’s a bit of an anachronistic approach. People used to write optimizers with multiple levels of IR, often based on the same data structures (but sometimes not), and they were effectively different dialects in the same sense as MLIR.

That fell out of favor due to the fact that you end up having to choose and commit to the phase ordering fairly early on and writing separate lowering steps to convert between dialects. So the tide turned toward having a single mid-level IR (and sometimes a single high-level IR for things like specific loop optimizations that was then lowered to that mid-level IR).

mathisfun123 · on Aug 17, 2023

> So the tide turned toward having a single mid-level IR (and sometimes a single high-level IR for things like specific loop optimizations that was then lowered to that mid-level IR).

You realize this is only feasible if you have one team working on a compiler for one domain right? Eg Rust's MIR is probably a good target for a systems language like rust but a bad target for a SQL like language.

>phase ordering fairly early on and writing separate lowering steps to convert between dialects.

I don't see how a single IR solves the phase ordering problem? LLVM IR is a single IR (not talking about backends) and yet you still have phase ordering problems.

subharmonicon · on Aug 17, 2023

> You realize this is only feasible if you have one team working on a compiler for one domain right?

It sounds like you think I’m advocating for something. I’m not. These are all just engineering trade offs that depend on your goals.

Regarding phase ordering: A single IR allows you to freely reorder passes rather than having to reimplement them if you want to move them earlier or later in the phase order.

jaen · on Aug 18, 2023

From an optimization perspective, such dialects are pretty much like the intermediate datastructures the "single-IR" style passes build internally anyway (eg. various loop analyses), just in a sharable and more consistent (if less performant) form.

Single IR passes from that perspective are roughly equivalent to MLIR-style `ir = loop_to_generic_dialect(my_loop_optimization(generic_to_loop_dialect(ir))`.

This assumes the existence of bidirectional dialect tranformations. Note that even LLVM IR, while a single IR, is technically multi-level as well, eg. for instruction selection, it needs to be canonicalized & expanded first, and feeding arbitrary IR into that pass will result in an exception (or sometimes even a segfault, considering it is C++).

Also, even though passes for single IR can theoretically be run in an arbitrary order, they are generally run in an order that can re-use (some) intermediate analysis results. This is, again, equivalent to minimizing the number of inter-dialect transformations in a multi-dialect IR.

mathisfun123 · on Aug 17, 2023

> A single IR allows you to freely reorder passes rather than having to reimplement them if you want to move them earlier or later in the phase order.

...i'm not trying to be rude here but... that's not how any of this works...

you can scholar.google.com "phase ordering llvm ir" to find thousands of papers that demonstrate.

subharmonicon · on Aug 18, 2023

I have no idea what you mean by “how any of this works”.

I didn’t bring up solving the phase ordering problem, you did.

I’m simply pointing out that if you have a compiler where you have multiple IRs or dialects of IR, and you have a pass that is written to work on IR “X”, and then at some point after that pass you translate to IR “Y”, if you want to move your pass after that point of translation, you either need to rewrite your pass so that it operates on “Y”, or you need to translate back to “X” again.

mathisfun123 · on Aug 18, 2023

> I didn’t bring up solving the phase ordering problem, you did.

what? the first mention of phase ordering this thread is due to you

>That fell out of favor due to the fact that you end up having to choose and commit to the phase ordering fairly early on

j2kun · on Aug 17, 2023

I was unaware! Thanks for that context.