A random thought just crossed my mind -- how about a genetic algorithm that experiments with all these options and evolves towards the best performance?
Run speed is objective and measurable, which is desirable for a fitness criterion.
Somehow, you'd have to specify which instructions can be moved around and which can't -- or maybe not, if you could also assess whether the program executed "successfully."
I don't really see it as practical, but it might be fun. Might uncover optimizations that no one has seen before.
Unfortunately in most modern OSes run speed on x86 is variable and profiling itself is intrusive. You might be able to get a general trendline toward a faster algorithm but it'll still be lost in statistical noise.
The second problem is that the underlying processor is a moving target. The x86 programming model doesn't directly map to hardware anymore, with out of order execution, branch predictors and register renaming. One variant of the processor with a particular caching algorithm might have totally different results than the next.
Run speed is objective and measurable, which is desirable for a fitness criterion.
Somehow, you'd have to specify which instructions can be moved around and which can't -- or maybe not, if you could also assess whether the program executed "successfully."
I don't really see it as practical, but it might be fun. Might uncover optimizations that no one has seen before.