The power throttling and voltage gating that goes on takes a long time--at least microseconds, up to a few milliseconds. The scheduling concerns that compilers deal with are worried about around tens to hundreds of clock cycles, a factor of well over a thousand.
Sure, but it's not really a scheduling decision. I think the GP is correct in as much compiler now have to make the hard choice of whether to use any AVX at all, and it's a global trade-off: even though using a few 64-byte moves might be locally optimal, you now need a higher license hence slower CPU and you can only evaluate if that trade-off makes sense in the scope of the larger program: how much such speedups do you get and does it compensate for the lower frequency?
Curious, does any compiler implement any kind of general algorithm for "memory pressure"? For register allocation (hence pressure), they do I think - but the memory layout, at least in lower level languages, is mostly fixed by the source so I didn't think there was much flexibility there.
Doable, but someone will have to take the first jump.