Very informative, thanks. > Julia doesn’t let you write transformations on its t...

mccoyb · on Feb 25, 2023

I’m talking about passes which operate on either (a) CodeInfo - a high-level IR which is the data interface for Julia’s abstract interpretation based type inference an (b) IRCode which is what Julia’s optimizer uses. Note: again, I haven’t looked recently —- but this is what I remember from a number of months ago.

When I discussed writing passes, I was referring to interacting with these two phases of the compiler (abstract interpreter and optimization). In practice, these two phases are interlinked.

Shuffling data between these two phases make a lot of assumptions which are mostly opaque to users. Like I said, you can hack it — but it’s hard to learn what you need to know, and there’s not a stable interface or a nice “this is how you write a transformation to operate on this IR” or “this is how you write a custom opt”.

In any case, I’m not totally convinced that it’s a good idea to expose this stuff to user libraries. Or, at least, it needs to be carefully thought about.

See some of the complaints about “magic” in this post for some of that. I’m just fascinated by this stuff for some weird reason.

plonk · on March 4, 2023

I understand, thank you for the details.

eigenspace · on Feb 25, 2023

I'm not the person you're responding to, but here's my take on it.

> What is pass composability?

Pass composability is something that comes up in julia a lot because our custom compiler passes are often done in the user space and have all sorts of interesting applications. The idea is just that we want to have multiple program transformations occuring at once.

I.e. suppose I'm using some sort of program transformation to turn regular code into derivative code with automatic differentiation (AD(, but suppose I *also* want to performing a program transformation in order to generate say GPU code, or I want to perform a program transformation that replaces all my heap allocations with allocations onto a Bump Allocator, or something else. One has to take care to make sure these different transformations can cooperate with eachother. Hell, it can even occur when one wants to do higher order AD that you have to stack two AD passes on top of eachother.

One problem here is that layering passes on top of eachother can cause a combinatorial explosion in the amount of generated code if things aren't being pruned or optimized between passes.

_________________________________

> Are we talking about LLVM's IR or is there another specific to Julia?

The person you were talking to was referring to Julia's own untyped and typed IR's respectively. Julia programs go through quite a few different forms of representation before they end up getting run. The pipeline looks like this:

1) String: Just a regular, unparsed string of text.

2) Expressions: this is a user facing representation of parsed code that our macros operate on. At this level, all that's really been done is parsing and a bit of canonicalization. There is no name or scope resolution done at this level, and everything is in terms of trees.

3) Untyped IR: This is a not-so-user-facing intermediate representation of julia code that is produced after an Expression tree gets linearized into SSA form. This has had name and scope resolution performed on it, but no type inference or optimization passes passes performed on it. Generated functions and various user-level compiler pass injection techniques are able to operate on this level of julia representation.

4) Typed IR: This is actually the same object as untyped IR, just with slots that used to be empty filled in. It has had type inference performed on it, and many of our custom julia optimization passes performed on it. The types here still correspond to julia level types. Ideally, we'd be doing user level pass injections on this level of IR where types are resolved, performing optimizations using those types to prune down the amount of code, and then performing the next program transformation, and so on.

5) LLVM IR: The next step after we're done with the typed IR is to translate it down to LLVM IR. This involves replacing julia types with LLVM types, and a bunch of other stuff. LLVM will then perform its own optimization passes (of our choice) on this IR. Some packages do program transformations on this level of code, for instance Enzyme.jl. One advantage of this is that the work can be easily shared with other LLVM backed languages.

6) Assembly code: The LLVM IR then gets compiled to assembly with involves yet more optimization and translation passes.

plonk · on March 4, 2023

Thank you, that's a very useful summary.