A Superoptimizer for LLVM IR

malux85 · on Feb 4, 2021

The UI toolkit without screenshots.

The scientific paper without reproducible environment.

The deep learning models with no weights.

The optimizer without benchmarks.

millstone · on Feb 4, 2021

Superoptimizers attempt to find shorter instruction sequences. This may or may not produce faster code - sometimes you can make code faster by inserting nops!

The idea of superoptimization is to find missed optimizations, which a human can review and then implement directly.

josephg · on Feb 4, 2021

> Superoptimizers attempt to find shorter instruction sequences.

That sounds measurable.

kzrdude · on Feb 4, 2021

It should have a trophy case then :) About nice found optimizations

millstone · on Feb 4, 2021

The trophy case is here: https://www.cs.utah.edu/~regehr/souper/

Example program 6: https://www.cs.utah.edu/~regehr/souper/output_6.html

    int a, b;
    void fn1() { b = a * ~1 == 1; }

Here we multiply an int by -2. LLVM emitted the multiply and compare, but Souper proved that no int multiplied by -2 can equal 1, so this could be optimized to 0.

By its nature Souper becomes less effective over time as its findings are incorporated into LLVM, so most of the trophies are old!

celrod · on Feb 4, 2021

Yeah, I'd like to know more about what it can do.

Can it optimize sequences of shuffles? If so, it'd be worth a try, even if it takes hours (probably doesn't take that long?), I could let it run until it finds faster sequences than what I have now, and update my code.

But for it to be worth the time for me to figure out how to install and use it, I need some sort of expectation for what I'd get out of trying this instead of something else.

kzrdude · on Feb 4, 2021

it's not an optimizer and so it's runtime doesn't matter really. It's intended to find new rules (patterns) that can in turn be implemented in llvm as new optimizations. So this thing is just a research step.

underdeserver · on Feb 4, 2021

If the goal is smaller code then show me a benchmark on the size of the code.

touisteur · on Feb 4, 2021

Are you sure about shorter? If you have a model of execution behaviour you can build the 'best' instruction sequence for your operation. Not all superoptimizers are born equal.

CUViper · on Feb 4, 2021

It would be kind of amusing if this shortened code by re-rolling an unrolled loop.

unwind · on Feb 4, 2021

Don't forget one of my favories, easily up there with graphical applications without screenthots:

The new programming language without sample code on the landing page.

nyanpasu64 · on Feb 4, 2021

The redesign of https://www.rust-lang.org/ removed any trace of actual code from its landing page.

jayd16 · on Feb 4, 2021

Is this designed for mobile only or something?

Anyway, at least the big Get Started button points to some code.

Cthulhu_ · on Feb 4, 2021

idk, looks fine on my screen. It's a landing page, doesn't have to be information dense.

Besides, over half of website use is via mobile nowadays.

meibo · on Feb 4, 2021

Ouch, that color scheme probably hurts more than the lack of code.

xxpor · on Feb 4, 2021

In the context of a superoptimizer benchmarks dont make sense really. You don't know what it's going to find.

ianhanschen · on Feb 4, 2021

For this sort of benchmarking you don’t want to know what it should find, that would be a functional test and not a benchmark.

Compiling and profiling reasonably complex apps against baseline would be enough.

JJJollyjim · on Feb 4, 2021

I'd also like to see motivating examples of specific things it does find (that LLVM doesn't), even if that's not a representative benchmark

xxpor · on Feb 4, 2021

I think I get what you're saying, I think people would read too much into them though. The results are very dependent on the type of program you have. I wouldn't be surprised if there were 10s of percentage point differences in benchmarks between various examples.

HelloNurse · on Feb 4, 2021

That's the point. Benchmarks would show what code is already squeezed dry by basic LLVM tools and what code could benefit from superoptimization, which (as other comments point out) is the only reasonable basis for deciding to give Souper a try.

hinkley · on Feb 4, 2021

I wish the Codeless Code wasn’t in mothballs.

kstenerud · on Feb 4, 2021

It seems like this is one of those projects that isn't really interested in explaining anything to non-compiler developers. Every time this link comes up, we get discussion variations of:

- Where are the benchmarks?

- Benchmarks aren't relevant; it's only for detecting possibilities.

- But it has some kind of direct replacement mode, why aren't there benchmarks for that?

- Crickets...

- Where is this used in practice?

- By developers who take its advice.

- Examples?

- Crickets...

judofyr · on Feb 4, 2021

But it has some kind of direct replacement mode, why aren't there benchmarks for that?

Crickets...

That's not fair. They've published a paper which has lots of details: https://arxiv.org/pdf/1711.04422.pdf

> Performance of the SPEC benchmarks was also unevenly affected by Souper: five become faster and seven become slower. None of the differences was very large.

kstenerud · on Feb 4, 2021

I'm not trying to denigrate their work or assert that it's not valuable research. My point is that the discoverability and information accessibility is very very low here, which is what leads to the frustration whenever this site gets posted.

dataking · on Feb 4, 2021

The research is very likely under peer-review so it is very likely that the site will be revised once the paper is accepted and presented at a conference. Just everyone else, people in academic have more stuff to do than there are hours in a day. Be kind, please.

ur-whale · on Feb 4, 2021

You're missing this one: "how can I use this without investing two days of my time figuring out how to plug it in my workflow?"

amelius · on Feb 4, 2021

Isn't this what we have HN for? ;)

jey · on Feb 4, 2021

Past discussion: https://news.ycombinator.com/item?id=10463312

kzrdude · on Feb 4, 2021

If there are any rustc devs in this thread, would it be possible to make a superoptimizer for Rust MIR? To do optimizations on this pre-monomorphization representation always sounds so attractive.

CUViper · on Feb 4, 2021

There's a working group for MIR optimization: https://rust-lang.github.io/compiler-team/working-groups/mir...

secondcoming · on Feb 4, 2021

Just raise a ticket

rahimiali · on Feb 4, 2021

I'm not sure if I'm interpreting the paper [0] correctly, but both the abstract and the quote below from the Experiments section sound like they're not going to aim for faster code, but for a meta goal (that I admit I don't understand) instead?

"The goal is to give compiler developers actionable advice about missing optimizations. To do this, someone uses sclang to extract Souper left-hand sides from programs of interest. Since Souper finds many more optimizations than developers could reasonably implement, the crux is ranking them in such a way that desirable optimizations are found early in the list. We are not aware of any single best ranking function, but rather we have created several such functions that can be used together or separately"

[0] https://arxiv.org/abs/1711.04422

jcranmer · on Feb 4, 2021

When you develop compilers, you generally have a three-way tradeoff that you can't get all three of simultaneously: runtime speed, generated code size, and compile time. For what's effectively a peephole pass, the main balance question is if the increase in runtime speed is going to be worth the extra time compiling the code. If patterns don't crop up very frequently, and the resulting code has only marginal speedups, then it's not going to be worth slowing everyone's compile time for that speedup.

In the context of LLVM IR, there's a subtle issue which is even more important. LLVM IR is sufficiently divorced from actual hardware instructions to be a good basis for a cost model. I believe Souper is largely using instruction count as its cost model [1], which is going to really fall down when you handle i1 values, since most processors aren't capable of doing arithmetic on i1 values natively, and converting the result of a comparison instruction into one that can be reasoned about with boolean circuits can involve several extra instructions.

[1] Regehr's original blog post on the matter says instructions are cost 1, constants and phis cost 0, and selects and div/rem cost 3. sexts/zexts/truncs aren't explicitly modeled, and so also presumably cost 0. So it's mostly instruction cost.

erichocean · on Feb 4, 2021

I feel like this gets submitted at least once a year.

gumby · on Feb 4, 2021

Indeed: if it were so great it would have moved into the tree by now.

fooker · on Feb 4, 2021

That's not how research projects work..

mhh__ · on Feb 4, 2021

Well, at least this one has code publicly available (I can't be bothered to name and shame but it's so annoying printing out a paper, reading it, seeing it looks good, and then realizing there's no code to verify)

derefr · on Feb 4, 2021

It kind of happened with Erlang’s HiPE, but then it fell back out because nobody upstream understood it well-enough to keep it working through major version updates.

ausbah · on Feb 4, 2021

how can something published on the Google github repo not be a product of Google..?

londons_explore · on Feb 4, 2021

Google internally requires employees pet projects done on company time to be published on the google git repo.

But if the company doesn't want to provide any kind of ongoing support (eg. if that employee leaves the company and abandons it), then they say this.