Superoptimizers attempt to find shorter instruction sequences. This may or may not produce faster code - sometimes you can make code faster by inserting nops!
The idea of superoptimization is to find missed optimizations, which a human can review and then implement directly.
Here we multiply an int by -2. LLVM emitted the multiply and compare, but Souper proved that no int multiplied by -2 can equal 1, so this could be optimized to 0.
By its nature Souper becomes less effective over time as its findings are incorporated into LLVM, so most of the trophies are old!
Can it optimize sequences of shuffles?
If so, it'd be worth a try, even if it takes hours (probably doesn't take that long?), I could let it run until it finds faster sequences than what I have now, and update my code.
But for it to be worth the time for me to figure out how to install and use it, I need some sort of expectation for what I'd get out of trying this instead of something else.
it's not an optimizer and so it's runtime doesn't matter really. It's intended to find new rules (patterns) that can in turn be implemented in llvm as new optimizations. So this thing is just a research step.
Are you sure about shorter? If you have a model of execution behaviour you can build the 'best' instruction sequence for your operation. Not all superoptimizers are born equal.
I think I get what you're saying, I think people would read too much into them though. The results are very dependent on the type of program you have. I wouldn't be surprised if there were 10s of percentage point differences in benchmarks between various examples.
That's the point. Benchmarks would show what code is already squeezed dry by basic LLVM tools and what code could benefit from superoptimization, which (as other comments point out) is the only reasonable basis for deciding to give Souper a try.
It seems like this is one of those projects that isn't really interested in explaining anything to non-compiler developers. Every time this link comes up, we get discussion variations of:
- Where are the benchmarks?
- Benchmarks aren't relevant; it's only for detecting possibilities.
- But it has some kind of direct replacement mode, why aren't there benchmarks for that?
> Performance of the SPEC benchmarks was also unevenly affected by Souper: five become faster and seven become slower. None of the differences was very large.
I'm not trying to denigrate their work or assert that it's not valuable research. My point is that the discoverability and information accessibility is very very low here, which is what leads to the frustration whenever this site gets posted.
The research is very likely under peer-review so it is very likely that the site will be revised once the paper is accepted and presented at a conference. Just everyone else, people in academic have more stuff to do than there are hours in a day. Be kind, please.
If there are any rustc devs in this thread, would it be possible to make a superoptimizer for Rust MIR? To do optimizations on this pre-monomorphization representation always sounds so attractive.
I'm not sure if I'm interpreting the paper [0] correctly, but both the abstract and the quote below from the Experiments section sound like they're not going to aim for faster code, but for a meta goal (that I admit I don't understand) instead?
"The goal is to give compiler developers actionable advice about missing optimizations. To do this, someone uses
sclang to extract Souper left-hand sides from programs of
interest. Since Souper finds many more optimizations than
developers could reasonably implement, the crux is ranking
them in such a way that desirable optimizations are found
early in the list. We are not aware of any single best ranking
function, but rather we have created several such functions
that can be used together or separately"
When you develop compilers, you generally have a three-way tradeoff that you can't get all three of simultaneously: runtime speed, generated code size, and compile time. For what's effectively a peephole pass, the main balance question is if the increase in runtime speed is going to be worth the extra time compiling the code. If patterns don't crop up very frequently, and the resulting code has only marginal speedups, then it's not going to be worth slowing everyone's compile time for that speedup.
In the context of LLVM IR, there's a subtle issue which is even more important. LLVM IR is sufficiently divorced from actual hardware instructions to be a good basis for a cost model. I believe Souper is largely using instruction count as its cost model [1], which is going to really fall down when you handle i1 values, since most processors aren't capable of doing arithmetic on i1 values natively, and converting the result of a comparison instruction into one that can be reasoned about with boolean circuits can involve several extra instructions.
[1] Regehr's original blog post on the matter says instructions are cost 1, constants and phis cost 0, and selects and div/rem cost 3. sexts/zexts/truncs aren't explicitly modeled, and so also presumably cost 0. So it's mostly instruction cost.
Well, at least this one has code publicly available (I can't be bothered to name and shame but it's so annoying printing out a paper, reading it, seeing it looks good, and then realizing there's no code to verify)
It kind of happened with Erlang’s HiPE, but then it fell back out because nobody upstream understood it well-enough to keep it working through major version updates.
The scientific paper without reproducible environment.
The deep learning models with no weights.
The optimizer without benchmarks.