Both AI and compilers are just software and right now the optimizers are written...

Both AI and compilers are just software and right now the optimizers are written manually which is kinda weird because the whole point of LLMs is to generate sequences of tokens that minimize some scalar valued loss function. In the case of compilers the input is some high level code in python expressing tensor operations and the output is whatever is executable by GPUs as fast as possible by combination of kernels which are formally equivalent to the tensor operations expressed in Python (or whatever higher level language is used to write the tensor specifications to be optimized for the task at hand). Everything in this loop has a well defined input with a well defined output and an associated scalar valued metric (execution time) and even a normalization factor (output length with shorter sequences being "better").

The whole thing seems obviously amenable to gradient based optimization and data augmentation with synthetic code generators. It is surprising that no one is pursuing such approaches to improving the optimization pipeline in kernel compilation/fusion/optimization because it is just another symbol game with much better defined metrics than natural language models.