Thanks for sharing these articles. I've been interested in MLIR ever since learning that the Mojo language would integrate with it [1], but it's been hard to find reference material geared towards beginners.
Does anyone have experiences using MLIR vs. a custom IR which lowers to LLVM IR? I've worked with the latter.
On one hand MLIR seems like it would be nice to work with. LLVM IR is fairly low-level and hard to debug, but has a great API; it would be great to use such an API but keep things like type-safety and invariants, and worry less about tricky undefined behavior. On the other hand, despite claims of extensibility and flexibility, it's hard to imagine it being easier to work with than a custom IR. Plus, it seems MLIR's extensibility was build to support some things in particular (custom architectures, "affine / polyhedral" optimizations).
i've worked on pytorch and mlir. pytorch has no less than 3 internal IRs and has tensorexprs, which lower to llvm ir. without a doubt mlir is preferrable because at the minimum it's a bigger ecosystem of users/developers etc that understand the compiler. so the code is clearer, the docs are better, there are way more features and tools. i really can't name a negative relative to pytorch, especially now that the one cute thing that pytorch had (codegen for boilerplate for the ir) has been implemented (better) in mlir.
> Plus, it seems MLIR's extensibility was build to support some things in particular
this isn't true in the least. just go to https://mlir.llvm.org/docs/Dialects/ and you'll see the set of upstream dialects is very diverse. and that's just upstream.
It’s not clear that all these dialects are actually helping things, or that the design of something from which it’s easy to make new dialects is that helpful.
Why?
I attended an MLIR online meeting months back and there was someone presenting who had a diagram that literally had a dozen different dialects and converters that were being written between different ones (but not MxN, so it was a strictly relatively arbitrary set of combinations that were being converted between, and mostly one-directional if I recall correctly). This was the “solution” to being able to make various things interoperate, but from a distance it looks a lot like a relatively obvious problem caused by the very nature of the design of MLIR.
The main reason to make a new dialect is so that a specific type of optimization is easy to write in that dialect. For example, the affine dialect explicitly exists so that one can implement polyhedral loop optimizations. These were possible in, say, GCC, but it requires "raising" an arbitrary low-level program—full of loops with mixed in memory access, weird control flow, and even GOTO—to identify the structural patterns needed to then implement polyhedral optimizations. It took many engineers over many years, even re-designing the entire GCC IR (into something called "GIMPLE"), to enable this, and IIUC it's still considered one of the more challenging parts of the GCC codebase to deal with.
Then with MLIR you want to do ML optimizations, which require even _higher_ level constructions like identifying linear algebra ops, so you can do stuff like tiling optimizations and lower to dedicated hardware instructions. THe high level MLIR dialects enables that information to be preserved as long as possible so that it's easier to implement all the magic.
Yes, I completely understand the reason for the dialects. I’ve been working on compilers for decades, and what’s interesting about MLIR is that it’s a bit of an anachronistic approach. People used to write optimizers with multiple levels of IR, often based on the same data structures (but sometimes not), and they were effectively different dialects in the same sense as MLIR.
That fell out of favor due to the fact that you end up having to choose and commit to the phase ordering fairly early on and writing separate lowering steps to convert between dialects. So the tide turned toward having a single mid-level IR (and sometimes a single high-level IR for things like specific loop optimizations that was then lowered to that mid-level IR).
> So the tide turned toward having a single mid-level IR (and sometimes a single high-level IR for things like specific loop optimizations that was then lowered to that mid-level IR).
You realize this is only feasible if you have one team working on a compiler for one domain right? Eg Rust's MIR is probably a good target for a systems language like rust but a bad target for a SQL like language.
>phase ordering fairly early on and writing separate lowering steps to convert between dialects.
I don't see how a single IR solves the phase ordering problem? LLVM IR is a single IR (not talking about backends) and yet you still have phase ordering problems.
> You realize this is only feasible if you have one team working on a compiler for one domain right?
It sounds like you think I’m advocating for something. I’m not. These are all just engineering trade offs that depend on your goals.
Regarding phase ordering: A single IR allows you to freely reorder passes rather than having to reimplement them if you want to move them earlier or later in the phase order.
From an optimization perspective, such dialects are pretty much like the intermediate datastructures the "single-IR" style passes build internally anyway (eg. various loop analyses), just in a sharable and more consistent (if less performant) form.
Single IR passes from that perspective are roughly equivalent to MLIR-style `ir = loop_to_generic_dialect(my_loop_optimization(generic_to_loop_dialect(ir))`.
This assumes the existence of bidirectional dialect tranformations. Note that even LLVM IR, while a single IR, is technically multi-level as well, eg. for instruction selection, it needs to be canonicalized & expanded first, and feeding arbitrary IR into that pass will result in an exception (or sometimes even a segfault, considering it is C++).
Also, even though passes for single IR can theoretically be run in an arbitrary order, they are generally run in an order that can re-use (some) intermediate analysis results. This is, again, equivalent to minimizing the number of inter-dialect transformations in a multi-dialect IR.
I have no idea what you mean by “how any of this works”.
I didn’t bring up solving the phase ordering problem, you did.
I’m simply pointing out that if you have a compiler where you have multiple IRs or dialects of IR, and you have a pass that is written to work on IR “X”, and then at some point after that pass you translate to IR “Y”, if you want to move your pass after that point of translation, you either need to rewrite your pass so that it operates on “Y”, or you need to translate back to “X” again.
two years ago MLIR was a big deal for the company I worked but then it went silent, the compiler team switched to something else for RISC-V ML design, not sure what it is though.
i don't understand the point of this criticism - i swear people just post it because they don't know what the thing is and they feel they need to compulsively comment anyway.
no one stumbles on this repo by accident (jeremy didn't post it here). the only way you find it is you google "mlir tutorial" ie you know what MLIR is and you're actively explicitly looking for a tut. there is no situation where you're ambushed by this content.
like if i write a tutorial "integration by parts for beginners" but don't define integration are you gonna claim that it fails at being for beginners? does every title from now on need to explicitly have "... for beginners who know what this is and have already made the decision to start studying".
It's funny, I agree with both of you. Considering this repo was posted to this site and prompted thousands of people to stumble on it by accident, even a link in the README on the words "MLIR framework" would go a long way. I, as someone who is both super interested in applied FHE and subscribed to this guy's blog, almost skipped this post because I didn't realize it was about FHE, and I am probably in the target audience for it.
> Considering this repo was posted to this site and prompted thousands of people to stumble on it by accident
but the guy himself didn't post it. how are people so entitled that they're upset about a deficiency in a thing that they sought out. no one is making you go the repo (or asking you to go).
it's literally like complaining about free samples at costco - if you don't like what they have don't take one!
Taking the time out to post to complain that there's no info in a README is definitely not something I might choose to do, but if multiple people did it for my project, I might consider reflecting on the feedback and altering one line of README text to lower the friction of people checking out my project.
The Costco example is perfect because if I was the sample manager or whatever and I overheard people rudely complaining to each other that they didn't like not knowing what was in the sample products, I might consider putting the ingredients list up next to the sample station. The goal is, after all, to get as many people to come check out the product as possible, right?
> The goal is, after all, to get as many people to come check out the product as possible, right?
the goal of this repo is to do something nice for people, not attract attention. i know that 100% if this was the reaction to my public service, i would immediately take the repo private because ungrateful people don't deserve to have nice things.
The first paragraph of the first article in the series says this:
"As we announced recently, my team at Google has started a new effort to build production-worthy engineering tools for Fully Homomorphic Encryption (FHE). One focal point of this, and one which I’ll be focusing on as long as Google is willing to pay me to do so, is building out a compiler toolchain for FHE in the MLIR framework (Multi-Level Intermediate Representation). The project is called Homomorphic Encryption Intermediate Representation, or HEIR."
By the way, the author of the GitHub repo has updated the README to have a link to the MLIR framework now. ¯\_(ツ)_/¯. Can't wait to read about it.
I should have taken a moment to see that the HN poster here and the repo owner are not the same thing, and directed my comment to the owner rather than here - I leapt to the conclusion that they were the same person.
He added a single link on one line, and that's all it took. I appreciate people recognizing positive intent and not someone taking potshots.
Weird that it’s for beginners without saying what it actually is. Machine learning for infrared? Machine learning inference? Or Is it talking about compilers (e.g. https://en.wikipedia.org/wiki/MLIR_(software))
MLIR (Multi-Level Intermediate Representation) is an LLVM project made to address some of the shortcomings of LLVM IR. Many languages using LLVM as a backend (e.g. Swift, Rust) have have resorted to defining their own IRs because dropping all the way down to LLVM IR loses almost all of a language’s semantics.
It’s pretty obviously about compilers. From the first article:
> As we announced recently, my team at Google has started a new effort to build production-worthy engineering tools for Fully Homomorphic Encryption (FHE). One focal point of this, and one which I’ll be focusing on as long as Google is willing to pay me to do so, is building out a compiler toolchain for FHE in the MLIR framework (Multi-Level Intermediate Representation). The project is called Homomorphic Encryption Intermediate Representation, or HEIR
Your definition of obvious is quite strange. I have to go from HN to the github to another link (the right one) to figure you what the hell is going on.
> This is the code repository for a series of articles on the MLIR framework.
And then has links to the (currently written) articles, the first one with "(Getting Started)" in the title. Seems like an obvious place to, you know, start reading about things.
I mean, as opposed to putting 2 lines of markdown that explain what the project is about in the github repositories instead forcing another level of indirection on the user. Seems like an obvious place to, you know, start reading about things.
look it's really simple: if this is too difficult for you to understand, no one is actually forcing you to learn any of it. on the contrary, if you're motivated enough, you'll figure out the answers to these and other trivial questions before starting the tutorial.
exactly - it's not like we're not literally sitting here engaging with this using an interface/medium that enables you to immediately figure out the definition of a word/phrase/etc.
their tablegen still munges the trees manually. a proper tablegen would give easy to write lhs -> rhs tree rules, and doesn"t allow shooting yourself into your foot.
see proper lisp/scheme matchers. even the gimple matcher hack is far better
I'm finding the Toy tutorial easier to read, and easier to use as a jumping off point into the MLIR reference docs. I thought the writing style in this tutorial was a bit dense, and Bazel seems like an unnecessary complication. Only an opinion though, YMMV.
I tried to learn from the toy tutorial and just found it to jump into the deep end too quick, without enough explanation for me to start writing code (because everything I touched in `toy` caused the build to break in mysterious ways).
Same opinion. I have little/no exposure to compilers (yes I know, maybe I'm not the right target) and tried to follow Toy and had to resort to a lot of wikipedia/google just to understand the concepts and terms. And after finishing it I was still a bit confused about how to even get started. I think it could be made a bit more beginner friendly.
[1]: https://docs.modular.com/mojo/notebooks/BoolMLIR.html