"We postulate that the original study produced good results because it evaluated the new scheduler in a machine whose performance bottleneck lay outside the execution core. The problems introduced by the steering logic were, as a result, hidden."
Any new microarchitecture needs to first answer that big question -- how is the core going to deal with memory?
This doesn't decouple across function calls like the Mill claims to, right? Although doing that optimization on existing C/C++ programs will be difficult.
Any new microarchitecture needs to first answer that big question -- how is the core going to deal with memory?