I don't understand what you are trying to say. The dependency chain length is wh...

jnordwick · on June 17, 2022

Sorry, I dropped all the 1s in that message when i typed it (laptop keyboard is a little sketchy right now). That should have been 15 and 18. I think the recent Intel microarchs take 14 plus howewever long at uopd decoding that minimum 1, so 15-20 or close to that.

> The dependency chain length is what is normally intended as instruction latency.

Yes, the way I read the original post and others was that you actually your response back in 3 cycles, which isn't correct. It doesn't get comitted for a while (but following instructions can use the result even if it hasn't been committed yet). You're not getting a result in less than 20 cycles basically.

gpderetta · on June 17, 2022

> Sorry, I dropped all the 1s in that message when i typed it

It makes sense now! :D

> You're not getting a result in less than 20 cycles basically

But the end of the pipeline is an arbitrary point. It will take a few more cycles to get to L1 (when it makes it out of the write buffer), a few tens more to traverse the L2 and L3 and hundreds to get to RAM (if it gets there at all). If it has to get to an human it will take thousands of cycles to get through the various busses to a screen or similar.

The only reasonably useful latency measure is what it takes for the value to be ready to be consumed by the next instruction, which is indeed 3-5 cycles depending to the specific microarchitecture.

jnordwick · on June 19, 2022

> which is indeed 3-5 cycles depending to the specific microarchitecture

I assume you are talking about from fetch to hitting the store buffer? That would be the aabsolute min time before the data could be seen elsehwere I would think. It can still potentially be rolled back, and that would be higher than reciprocal be way too fast to sustain, but for a single instr burst, I'm not sure. So much happens at the same time, An L1 read hit will cost you 4 minimum, hut all but 1 of that is hidden. can't avoid the multi cost of 3 or add 1. the decoding and uop cache hit, reservation, etc will cost a few. I have no idea.

If you know of anything describing it in such detail, I would be comopletely curiouis.