You’re really reconstructing a data depence graph not an interference graph. In ...

rayiner · on Oct 18, 2018

A couple of more points. The architectural registers are real places. Say you write to RDX, don’t do anything with it for a million instructions, then read from it. Where does the data come from? There is an architectural register file and RDX is a specific register in it. The renaming is only temporary. An instruction doesn’t hang on to a ROB entry and rename register until every instruction that may use it has executed. When the instruction is retired in order, the value is written back to the architectural register file, and the rename register and ROB entry is freed for some other instruction. So the renaming is ephemeral, only lasting until retirement.

Tuna-Fish · on Oct 18, 2018

> A couple of more points. The architectural registers are real places. Say you write to RDX, don’t do anything with it for a million instructions, then read from it. Where does the data come from? There is an architectural register file and RDX is a specific register in it.

This was true on all the P6-derived cores, but is no longer true on any modern CPU that uses a PRF.

On PRF cpus (SNB and everything after it), the final backing store of register data is the PRF, and the architectural registers are just pointers to this file. Every instruction gets renamed in the frontend, and there simply is no architectural register file.

Also, ROB does a very different task on those CPUs, to the point that still calling it ROB is misleading. The current architectural register names are held in the frontend, as part of the register rename hardware, not in the ROB. ROB holds on to instruction status, and the PRF register names for all the sources. So an used PRF entry can be garbage collected when it no longer exists in either the architectural name file or the ROB.

For a more detailed explanation, read: https://www.realworldtech.com/sandy-bridge/5/

rayiner · on Oct 18, 2018

That's correct, but a bit in the weeds. I was addressing OP's statement: "rdi, rsi, etc. are nothing but compressed node identifiers in an interference graph." In a modern CPU, this is true only for speculative state. If I have "ADD RAX, 1" and RAX was last written by an in-flight operation, I don't care about RAX qua RAX. It's just a name that connects an operation that produces a value to an operation that consumes it. However, if the instruction that last wrote RAX was committed long ago, what we care about is the architectural state of RAX. That distinguishes modern OOO CPUs from pure data flow machines.

This fact is easiest to see in a ROB design: https://courses.cs.washington.edu/courses/cse378/10au/lectur... (page 7). The front-end rename table may point either to a ROB entry, or the architectural register file. Note, however, that the basic distinction exists in a PRF design too--some physical registers will hold architectural values (pointed to by the RRAT or similar structure) while others will hold speculative values. We can quibble about terminology, but there will be a subset of the PRF, that is pointed to by the RRAT, that contains the machine's non-speculative, architectural state.

Tuna-Fish · on Oct 18, 2018

> However, if the instruction that last wrote RAX was committed long ago, what we care about is the architectural state of RAX.

What you care about is which PRF entry will contain the valid value of RAX for the instruction going through rename. Unlike in earlier designs, post-SNB ones always do exactly the same thing during rename, that is, allocate a clean PRF register as destination and read the PRF number currently stored under the name RAX in the RAT and store that as source. Whether that entry was written to by the previous instruction, or has been stale for a thousand instructions is entirely irrelevant.

The ROB design document you posted is explicitly of an older system, that is different from modern ones. On post-SNB designs, the rename table entry and ROB entries can only point to the PRF (the ROB no longer ever contains register state!), and there is no architectural register file.

> Note, however, that the basic distinction exists in a PRF design too--some physical registers will hold architectural values

While technically true, this is extremely misleading. The architectural values reside in whatever rename register was last assigned to them. For the entire pipeline, there is no distinction between architectural registers and the rest of them.

My original complaint is of this sentence in your OP:

> When the instruction is retired in order, the value is written back to the architectural register file

Which is explicitly false of every Intel CPU released since 2011 (except maybe some atoms?). PRF machines never move values from PRF. It is the final source of truth for all registers. When "ADD RAX, 1" is executed, a new PRF entry is allocated for the result, and once that is written, it remains where RAX lives until another instruction that stores to RAX is executed, no matter how far away that is.

rayiner · on Oct 18, 2018

> What you care about is which PRF entry will contain the valid value of RAX for the instruction going through rename... The ROB design document you posted is explicitly of an older system, that is different from modern ones.

You're getting caught up in where the data is stored while I'm talking about the nature of the data conceptually, vis-a-vis OP's analogy between register renaming and graphs. I didn't post the ROB example to disagree with you about how post-SNB CPUs work re: PRF (your description is correct). I posted it because ROBs are a good illustration of the difference between speculative values and committed values, since they store speculative values in one place (the ROB) and committed values in the other (the ARF). The same distinction exists with PRFs (see below), but it's harder to see.

> The architectural values reside in whatever rename register was last assigned to them. For the entire pipeline, there is no distinction between architectural registers and the rest of them.

Architectural registers are different because they will have an RRAT entry pointing to them. And there certainly is a distinction between PRF entries that hold architectural state and PRF entries that hold speculative state, e.g. when there is an exception. That's why the RRAT exists in the first place--so the CPU can identify the subset of the PRF that holds the architectural state. If there was "no distinction" you wouldn't have an RRAT.

Concrete example:

MOV EAX, 15

-------------- < committed

MOV ECX, [EBX]

IMUL EAX, 5

ADD EAX, 1

-------------- < renamed

Say the first instruction has committed, the second has issued, and the others are waiting to issue. Say EAX in the first instruction is allocated to PR0, in the third it's PR1, and in the fourth its PR2. The RRAT has an entry for EAX that points to PR0. The front-end RAT has an entry for EAX that points to PR2. PR0 holds committed state. PR1 and PR2 (will) hold speculative state. The reservation stations and ROB encode a data-flow dependency between the fourth and third instructions. (PR1 is an operand to the fourth instruction, which produces PR2). There is no data dependence graph for anything that's committed, however. It's just values in architectural registers.

If the load throws a page fault, what happens? The machine grabs the architectural state from the RRAT. Execution restarts with PR0 in the front-end RAT entry for EAX, and the page-fault handler sees EAX == 15. If the third and fourth instructions issued in the meantime, their results are thrown away. That is the difference between architectural and speculative state.

> Whether that entry was written to by the previous instruction, or has been stale for a thousand instructions is entirely irrelevant.

It's relevant to the analogy that OP was making between register renaming and graphs. In a data dependence graph, all data flow can be represented with edges from an operations that produce operands to the operations that consume those operands. In an out-of-order machine, that graph exists (after renaming) only for instructions that are still in flight. For instructions that were committed long ago, the graph is collapsed to the result, denoted by an architectural register.

> Which is explicitly false of every Intel CPU released since 2011

The point (for purposes of responding to the OP's analogy) is that you need to update the architectural state when an instruction retires. The fact that "post-2011 Intel CPUs" do this by flipping a pointer rather than copying the data is an implementation detail that is educational, but not really relevant.

KMag · on Oct 18, 2018

Interesting. I always figured there was just a lookup table mapping ISA registers to physical registers that was used by instruction decode, a bit vector (one bit per physical register) indicating which physical registers were in use, and each micro-op would carry with it which physical register(s) would be freed up upon instruction retirement.

If a register might or might not be renamed, what's the advantage of usually storing a given register in a given location? Is the main reason power savings, so checking a single bit to check if a register has been renamed, before paying the power and time to perform a lookup in the renaming table? Is access to a renamed register then slower than access to a non-renamed register?

rayiner · on Oct 18, 2018

You need to be able to recover architectural register state (e.g. if there is a branch misprediction, or an instruction throws an exception). Many implementations store speculative results in a reorder buffer, and write to an architectural register file on commit. The architectural register file always reflects the committed state of the processor. In such an implementation, your renaming table (the "lookup table mapping ISA registers to physical registers") might point to either a ROB entry, if the most recent write to an architectural register was an instruction that is still in flight, or the architectural register file.

Another way to do it is to use a physical register file to store data, as Tuna-Fish explains above. In that case, your front-end table always points into the PRF, but it may be a speculative value (written by an instruction that's still in the ROB and not yet committed) or a committed value. But that doesn't let you recover the architectural state on an exception. So you keep a second table (in the Pentium 4, it's called the RRAT), and whenever an instruction commits, you update that table to indicate that the most recent committed version of a particular architectural register can be found in a particular physical register.

There is at least a third way to do it, which doesn't use an RRAT. MIPS R10k rolls back the ROB to recover the last committed architectural state.

Tuna-Fish · on Oct 18, 2018

You are more or less correct, he was wrong. Architectural registers are just pointers to the PRF these days.

monocasa · on Oct 18, 2018

Are you sure about that? The HDL I've seen doesn't appear to do it that way. If I'm reading BOOM's source correctly for instance, there's not a separate architectural file.