As someone who doesn't follow compilers too much, is 3-7% considered good? Throwing a whole ML model in to the compiler seems like a pretty large boost to complexity for a small gain in performance.
If you can fit 5% more functionality into a fixed size that is nice. Especially if you are only using this optimization level for the final production build that needs to be squeezed onto a small (and therefore cheaper) ROM.
Although I do agree that these days most projects will have media that takes up the majority of the space.
The stat that seems clearly impressive for me is that their register allocator claims "0.3% ~1.5% improvements in queries per second" which is a huge cost savings for operations at the scale of Google. If you have 100 datacenters of software running you can conceivably turn one of them off. (Or more likely slow down future building and expansion plans). Of course for most people compute costs aren't a significant expense.