The parent's ROB is the reorder buffer. AIUI it causes the instructions to be retired in order (with exceptions stored until retirement, then exposed). The original question, though is how a particular u-op is mapped back to the original macro-instruction, so we know what macro-instruction excepted.
And I don't know. I guess is if each u-op is tagged with the instruction address within the process, that would do, but that's carrying around at least 32 bits, which is quite a large tag.
Alternatively tag indirectly, which is more likely (you can have maybe 256 instructions 'hot' at any time so an 8-bit tag on each u-op pointing to a 32 or 64-bit table entry (edit: holding the actual address of the macro-op). And the window for the ROB and the other thing that does instruction issue, is ~200 instructions, so that sounds more plausible).
Your alternative is sort of close. What happens is that every x86 instruction is assigned a ROB entry. Every uop that has results (stores are handled separately) is assigned a clean register out of the physical register file, and the address of this register (or multiple registers in case of multi-uop instructions) is stored in the corresponding ROB entry. The ROB acts like an in-order circular list -- the retire phase drains it in order from the oldest first, retiring the oldest instruction if and only if all the corresponding PRF registers have been written to. This is the point where any and all side-effects are made visible.
I realise you're describing a dataflow engine. Suddenly it's starting to fall into place. Tomasulo's algorithm (which this is about?) is starting to fall into place.
Which is a) amazing and b) OMG the frigging complexity of something that has to run at sub-ns speeds. It's like sausages, the closer you look the less there is to enjoy.
Thanks. The bit im not sure about is how to prevent ending up Ina state in the middle of a single instruction if one instruction gets split into multiple uops. Like if one instruction gets split into two uops and the first one completes but the second one raises an exception.
OK, let me try (Tuna-Fish, put me right at any point).
> Like if one instruction gets split into two uops and the first one completes but the second one raises an exception.
That's not a problem. It's just one of n exception types that instruction can raise. Suppose a macro (say x64 instruction, if something like this exists) division instruction where one operand could be fetched from memory, you could have
r2 <- r3 / ^r4
where ^r4 fetches the contents of memory at address held in r4.
suppose it's split up into u-ops
tr6 <- ^r4 ;; tr6 is temporary register 6, invisible to programmer
r2 <- r3 / tr6
you could have a division by zero at u-op 2, or an invalid address exception for u-op 1. Either of those are valid exceptions for the original single macro-op.
Extrapolating from what Tuna-Fish said, the ROB is list of macro instructions, each instruction I assume will be tagged with its actual macro-op address, and each u-op must link back to the originating macro-op so macro-op retirement can take place, so we have a small (8 bit? Because ROB queue is small) pointer from each u-op back into the macro-op in the ROB.
Follow the 8-bit u-op ptr to the ROB, get the originating macro-op address, raise exception at that address.
Assuming I'm right, and assuming I understood you question correctly. I'll have to read his answer more carefully again.
edit: swapped ^ for asterisk as deref operation, as stars interpreted as formatting. Edit 2: slightly clearer.
Perhaps there are two copies of the program pointer, one at the "top of the pipe" updated by instruction decode and branch prediction, and one at the "bottom of the pipe" updated by the ROB. Then uops only need to carry the amount by which the program counter is advanced, and certain events can cause a pipeline flush and copy the bottom of pipe version of the counter to the top of the pipe.
I see what they mean then. I read the question a bit too fast.
I was mostly assuming that the issue of the operation reserved a spot in the outstanding buffer which would ensure sequencing of the write or commit after execution which would walk through the buffer in sequence.
But you're right that there are still more questions to how some of that data is tracked through the pipeline.
And I don't know. I guess is if each u-op is tagged with the instruction address within the process, that would do, but that's carrying around at least 32 bits, which is quite a large tag.
Alternatively tag indirectly, which is more likely (you can have maybe 256 instructions 'hot' at any time so an 8-bit tag on each u-op pointing to a 32 or 64-bit table entry (edit: holding the actual address of the macro-op). And the window for the ROB and the other thing that does instruction issue, is ~200 instructions, so that sounds more plausible).
All speculation on my part though!