The parent's ROB *is* the reorder buffer. AIUI it causes the instructions to be ...

Tuna-Fish · on July 6, 2019

Your alternative is sort of close. What happens is that every x86 instruction is assigned a ROB entry. Every uop that has results (stores are handled separately) is assigned a clean register out of the physical register file, and the address of this register (or multiple registers in case of multi-uop instructions) is stored in the corresponding ROB entry. The ROB acts like an in-order circular list -- the retire phase drains it in order from the oldest first, retiring the oldest instruction if and only if all the corresponding PRF registers have been written to. This is the point where any and all side-effects are made visible.

tempguy9999 · on July 7, 2019

I realise you're describing a dataflow engine. Suddenly it's starting to fall into place. Tomasulo's algorithm (which this is about?) is starting to fall into place.

Which is a) amazing and b) OMG the frigging complexity of something that has to run at sub-ns speeds. It's like sausages, the closer you look the less there is to enjoy.

Thanks!

rrss · on July 7, 2019

Thanks. The bit im not sure about is how to prevent ending up Ina state in the middle of a single instruction if one instruction gets split into multiple uops. Like if one instruction gets split into two uops and the first one completes but the second one raises an exception.

tempguy9999 · on July 7, 2019

OK, let me try (Tuna-Fish, put me right at any point).

> Like if one instruction gets split into two uops and the first one completes but the second one raises an exception.

That's not a problem. It's just one of n exception types that instruction can raise. Suppose a macro (say x64 instruction, if something like this exists) division instruction where one operand could be fetched from memory, you could have

  r2 <- r3 / ^r4

where ^r4 fetches the contents of memory at address held in r4.

suppose it's split up into u-ops

  tr6 <- ^r4  ;; tr6 is temporary register 6, invisible to programmer
  r2 <- r3 / tr6

you could have a division by zero at u-op 2, or an invalid address exception for u-op 1. Either of those are valid exceptions for the original single macro-op.

Extrapolating from what Tuna-Fish said, the ROB is list of macro instructions, each instruction I assume will be tagged with its actual macro-op address, and each u-op must link back to the originating macro-op so macro-op retirement can take place, so we have a small (8 bit? Because ROB queue is small) pointer from each u-op back into the macro-op in the ROB.

Follow the 8-bit u-op ptr to the ROB, get the originating macro-op address, raise exception at that address.

Assuming I'm right, and assuming I understood you question correctly. I'll have to read his answer more carefully again.

edit: swapped ^ for asterisk as deref operation, as stars interpreted as formatting. Edit 2: slightly clearer.

atq2119 · on July 6, 2019

Perhaps there are two copies of the program pointer, one at the "top of the pipe" updated by instruction decode and branch prediction, and one at the "bottom of the pipe" updated by the ROB. Then uops only need to carry the amount by which the program counter is advanced, and certain events can cause a pipeline flush and copy the bottom of pipe version of the counter to the top of the pipe.

But that's also all speculation :)

tempguy9999 · on July 6, 2019

> But that's also all speculation :)

How appropriate!

But, your idea I think will not work in general where there's branching.

strmpnk · on July 6, 2019

I see what they mean then. I read the question a bit too fast.

I was mostly assuming that the issue of the operation reserved a spot in the outstanding buffer which would ensure sequencing of the write or commit after execution which would walk through the buffer in sequence.

But you're right that there are still more questions to how some of that data is tracked through the pipeline.