Mazeppa: A modern supercompiler for call-by-value functional languages

batterseapower · 2024-07-15T02:14:50.000000Z

My PhD was on supercompilation for Haskell (thanks for the cite in the repo :-))

Cool to see that people are still working on it, but I think that the main barrier to practical use of these techniques still remains unsolved. The problem is that it's just so easy for the supercomplier to go into some crazy exponential blowup of function unfolding, making the compilation step take impractically long.

Even if you avoid a literal exponential blowup you can easily end up generating tons of specializations that bloat your code cache but don't reveal any useful optimization opportunities/are infrequently used in practice. Similar performance problems also plague related techniques like trace-based JITs, even though the trace JIT happens at runtime and thus has access to strictly more information about the frequency with which a trace might be used.

You can try to use annotations like the @extract proposed in the article to control these problems, but it can be hard to predict in advance when this is going to occur.

One interesting research direction might be to use deep reinforcement learning to try to guide the generalization/termination decisions, where the reward is based on A) whether the unfolding leads to a tie-back later on, and B) to what extent the unfolding allowed deforestation/beta reduction to take place.

Hirrolot · 2024-07-15T12:12:46.000000Z

Thanks for commenting! Yes, automatically predicting whether it's beneficial to specialize a function is not implemented. It's under the section "Further research" to find suitable heuristics to automatically emit annotations like those `@extract` that we already have.

This topic was brought up many times in the past. Most of the time it was suggested to use speculative approaches to detect blowups, but I haven't actually seen any attempts to predict them before supercompilation. For example, while I was writing Mazeppa, I've seen many times that blowups occur because some function call gives insufficient information that is subsequently pattern-matched by another function, and since it cannot be properly pattern-matched at compile-time, a lot of code duplication takes place.

I'm leaning towards a kind of "abstract interpretation pass" before supercompilation to approximate "good" and "bad" interactions between functions. After all, given that offline analyses exist even for harder problems such as detecting non-termination, why cannot we understand how supercompilation gives us desirable results?

c0balt · 2024-07-15T02:46:26.000000Z

Maybe this is naive but couldn't you use the same approach with profiles collected at runtime as PGO like LLVM/GCC? In my experience it can help quite a lot with guiding inlining both by hand and via a compiler.

batterseapower · 2024-07-15T02:59:04.000000Z

I don't think that's naive - it could definitely help ameliorate the issues. To my knowledge noone has tried this approach.

The other idea I had when I was working on this stuff was to do JIT supercompliation. Clasically, supercompilers expand the whole graph at compile time, but it would be straightforward to delay the supercompilation process until the first time that the code is actually executed, which helps tame the problem of generating lots of specializations that may never actually be executed in practice.

(Choosing a generalization heuristic becomes more important in the JIT setting because you always have access to the full info about all parameters to the function you are specializing, and naively you would end up e.g. specializing sigmoid(x)=1/(1+exp(x)) for each value of x observed at runtime.)

rowanG077 · 2024-07-15T07:30:17.000000Z

Do current JIT implementation already have something to control specialization? I have heard many times for example that JavaScript engines will specialize hot paths.

moomin · 2024-07-15T23:13:36.000000Z

To the best of my knowledge (which isn’t great) .NET has multiple compilation levels for functions, using more time after a method has been called a few times. But starts with a pretty fast but naive compilation. It also, I believe, specialises everything all the time, but (probably because of the multi-level compilation) that doesn’t seem to be a practical problem.

parhamn · 2024-07-14T16:40:20.000000Z

That was such a good Readme. Wonderful explanations for those unfamiliar with the space, full examples, a bit of history/context. Thanks!

Hirrolot · 2024-07-14T16:48:46.000000Z

Thanks, I'm glad the explanations were helpful.

082349872349872 · 2024-07-12T16:56:03.000000Z

Is the manual annotation is because deciding whether or not a graph will blow up upon supercompilation by trying, but checking for blowup during the process, doesn't work?

> In Mazeppa, we have call-by-value functions and call-by-name (call-by-need) constructors, which 1) makes deforestation possible and 2) preserves the original semantics of code.

Are there any other notable differences between this approach and call-by-value constructors? (I guess one might have to carry around the original context in closures for a little while, but presumably even that disappears in residual programs?)

[Stepan Razin lost his black hat during his ride, but the supercompiler has already removed Ivan Mazepa's?]

Hirrolot · 2024-07-14T14:41:15.000000Z

Checking for blowup sort of works; for example, this paper [1] suggests exactly this approach. However, I'm not leaning to it due to the following reasons:

1. The approach involves magic numbers to implement heuristics. They worked for the samples provided by the authors, but will they work for large programs?

2. I think that there's a smarter approach to detect blowups. Specifically, there can be "good" and "bad" interactions in the form `f(g(...), ...)`, where `f` and `g` are functions; an interaction is good if `g` produces information that is known at compile-time (to be subsequently deconstructed by `f`), and bad if it doesn't. If the interaction is bad, `f` and `g` should be transformed separately.

In general, I see manual annotations as a low-level mechanism that can be used for predictable supercompilation. We haven't yet implemented heuristics that would guarantee predictability, but this is on our priority list.

> Are there any other notable differences between this approach and call-by-value constructors?

Yes, lazy constructors must be implemented differently than eager ones. The standard approach is to enclose constructor arguments under an environment of values, just as we implement closure functions. I don't think that supercompilation can remove all laziness from the constructors.

[1] Jonsson, Peter & Nordlander, Johan. (2011). Taming code explosion in supercompilation. PERM'11 - Proceedings of the 20th ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. 33-42. 10.1145/1929501.1929507.

faizshah · 2024-07-15T10:34:06.000000Z

I wonder how the debugging experience is, it must be pretty difficult to generate correct source maps.

voldacar · 2024-07-15T01:39:37.000000Z

Cool. How necessary is the type system? Could you make something similar for lispy languages?

batterseapower · 2024-07-15T02:16:50.000000Z

Supercompilation doesn't depend on having a typed language - you can more or less derive a supercompiler mechanically from any operational semantics, typed or untyped.

keiferski · 2024-07-14T12:30:52.000000Z

The likely origin of the name, if you’re curious:

https://en.wikipedia.org/wiki/Mazeppa_(poem)

Hirrolot · 2024-07-14T14:25:13.000000Z

The name was inspired by Transcendental Étude No. 4 [1], as correctly suggested by the other comment.

[1] https://www.youtube.com/watch?v=MYSFTtYc4P4

keiferski · 2024-07-14T15:02:50.000000Z

It looks like those compositions were ultimately inspired by the poem. So we’re both right ;)

https://en.wikipedia.org/wiki/Cultural_legacy_of_Mazeppa

alekseiprokopev · 2024-07-15T02:32:39.000000Z

The first thing I though was this guy lol https://en.wikipedia.org/wiki/Ivan_Mazepa I support Ukraine btw.

rottc0dd · 2024-07-14T14:12:46.000000Z

Oh! I thought it was this:

https://youtu.be/CXGeOHdiHrE

TIL