Super-optimizers always strike me with the same feeling when seeing examples like this. On the one hand, it's awesome to be able to have truly-optimal (under chosen assumptions) code sequences for certain operations. On the other hand, so many of the examples just hammer home how many missed practical opportunities there are for peephole optimizations (in this case) and other traditional, local optimization techniques. I almost feel like the most valuable use of a super-optimizer would be to run it against a bunch of real world code patterns, and then use the optimization results to feed back in to simple traditional optimizations.
LLVM generates so much more code than GCC for that. Crazy - obviously the compiler is damned if it does, damned if it doesn't, but optimizing for small strings is probably not a bad idea.
Semi related: Does anyone know if there's been promising research in using LLMs for compilation? Seems like if you have an LLM that can have higher levels of abstraction understanding of a codebase it could (in theory) be able to optimize it in ways traditional compilers might not.
I think using an LLM specifically would be wasteful and produce too many errors.
There are many research projects out there exploring different ways to use neural networks for compiler optimisation though.
My impression is that most of them are using it as a technique to replace heuristics to find candidate solutions for local optimisation problems.
AlphaDev[1][2] seems to be the only project, in the news at least, that approaches low-level compilation research. And even there, you have only a small subset of optimization.
It depends on what parameters you give the system control over. It doesn't necessarily have to generate the code itself, it can simply control one of the many other things. For example, if you only give control over some dimensional parameters like the command line flags, or something like warp/block/thread scheduling parameters (e.g. GPU programming), then you could maybe ask questions like "Given this block of code, running on this hardware, what are the best compilation flags and build parameters."
There was an interesting paper a while back before the LLM craze that basically did this for OpenMP programs. You'd feed it the source code to some loop body or whatever, and it would try to pick the right OMP block/loop scheduling parameters from there.
Naturally, if you're training it on open code, you will have to censor the AI so it doesn't use flags like -ffast-math too much. :)