It’s limited, but in the argument passing context you’re storing to a location t...

FullyFunctional · 2024-04-19T16:29:37

Due to speculative out-of-order execution, it's not just "a few cycles". The LSU has a hard, small, limit on the number of outstanding loads and stores (usually separate limits, on the order of 8-32) and once you fill that, you have to stop issuing until commit has drained them.

This discussion is yet another instance of the fallacy of "Intel has optimized for the current code so let's not improve it!". Other examples include branch prediction (correctly predicted branch as a small but not zero cost) and indirect jump prediction. And this doesn't even begin to address implementations that might be less aggressive about making up for bad code (like most RISCs and RISC-likes).