Hacker News new | past | comments | ask | show | jobs | submit login

That's correct, but a bit in the weeds. I was addressing OP's statement: "rdi, rsi, etc. are nothing but compressed node identifiers in an interference graph." In a modern CPU, this is true only for speculative state. If I have "ADD RAX, 1" and RAX was last written by an in-flight operation, I don't care about RAX qua RAX. It's just a name that connects an operation that produces a value to an operation that consumes it. However, if the instruction that last wrote RAX was committed long ago, what we care about is the architectural state of RAX. That distinguishes modern OOO CPUs from pure data flow machines.

This fact is easiest to see in a ROB design: https://courses.cs.washington.edu/courses/cse378/10au/lectur... (page 7). The front-end rename table may point either to a ROB entry, or the architectural register file. Note, however, that the basic distinction exists in a PRF design too--some physical registers will hold architectural values (pointed to by the RRAT or similar structure) while others will hold speculative values. We can quibble about terminology, but there will be a subset of the PRF, that is pointed to by the RRAT, that contains the machine's non-speculative, architectural state.




> However, if the instruction that last wrote RAX was committed long ago, what we care about is the architectural state of RAX.

What you care about is which PRF entry will contain the valid value of RAX for the instruction going through rename. Unlike in earlier designs, post-SNB ones always do exactly the same thing during rename, that is, allocate a clean PRF register as destination and read the PRF number currently stored under the name RAX in the RAT and store that as source. Whether that entry was written to by the previous instruction, or has been stale for a thousand instructions is entirely irrelevant.

The ROB design document you posted is explicitly of an older system, that is different from modern ones. On post-SNB designs, the rename table entry and ROB entries can only point to the PRF (the ROB no longer ever contains register state!), and there is no architectural register file.

> Note, however, that the basic distinction exists in a PRF design too--some physical registers will hold architectural values

While technically true, this is extremely misleading. The architectural values reside in whatever rename register was last assigned to them. For the entire pipeline, there is no distinction between architectural registers and the rest of them.

My original complaint is of this sentence in your OP:

> When the instruction is retired in order, the value is written back to the architectural register file

Which is explicitly false of every Intel CPU released since 2011 (except maybe some atoms?). PRF machines never move values from PRF. It is the final source of truth for all registers. When "ADD RAX, 1" is executed, a new PRF entry is allocated for the result, and once that is written, it remains where RAX lives until another instruction that stores to RAX is executed, no matter how far away that is.


> What you care about is which PRF entry will contain the valid value of RAX for the instruction going through rename... The ROB design document you posted is explicitly of an older system, that is different from modern ones.

You're getting caught up in where the data is stored while I'm talking about the nature of the data conceptually, vis-a-vis OP's analogy between register renaming and graphs. I didn't post the ROB example to disagree with you about how post-SNB CPUs work re: PRF (your description is correct). I posted it because ROBs are a good illustration of the difference between speculative values and committed values, since they store speculative values in one place (the ROB) and committed values in the other (the ARF). The same distinction exists with PRFs (see below), but it's harder to see.

> The architectural values reside in whatever rename register was last assigned to them. For the entire pipeline, there is no distinction between architectural registers and the rest of them.

Architectural registers are different because they will have an RRAT entry pointing to them. And there certainly is a distinction between PRF entries that hold architectural state and PRF entries that hold speculative state, e.g. when there is an exception. That's why the RRAT exists in the first place--so the CPU can identify the subset of the PRF that holds the architectural state. If there was "no distinction" you wouldn't have an RRAT.

Concrete example:

MOV EAX, 15

-------------- < committed

MOV ECX, [EBX]

IMUL EAX, 5

ADD EAX, 1

-------------- < renamed

Say the first instruction has committed, the second has issued, and the others are waiting to issue. Say EAX in the first instruction is allocated to PR0, in the third it's PR1, and in the fourth its PR2. The RRAT has an entry for EAX that points to PR0. The front-end RAT has an entry for EAX that points to PR2. PR0 holds committed state. PR1 and PR2 (will) hold speculative state. The reservation stations and ROB encode a data-flow dependency between the fourth and third instructions. (PR1 is an operand to the fourth instruction, which produces PR2). There is no data dependence graph for anything that's committed, however. It's just values in architectural registers.

If the load throws a page fault, what happens? The machine grabs the architectural state from the RRAT. Execution restarts with PR0 in the front-end RAT entry for EAX, and the page-fault handler sees EAX == 15. If the third and fourth instructions issued in the meantime, their results are thrown away. That is the difference between architectural and speculative state.

> Whether that entry was written to by the previous instruction, or has been stale for a thousand instructions is entirely irrelevant.

It's relevant to the analogy that OP was making between register renaming and graphs. In a data dependence graph, all data flow can be represented with edges from an operations that produce operands to the operations that consume those operands. In an out-of-order machine, that graph exists (after renaming) only for instructions that are still in flight. For instructions that were committed long ago, the graph is collapsed to the result, denoted by an architectural register.

> Which is explicitly false of every Intel CPU released since 2011

The point (for purposes of responding to the OP's analogy) is that you need to update the architectural state when an instruction retires. The fact that "post-2011 Intel CPUs" do this by flipping a pointer rather than copying the data is an implementation detail that is educational, but not really relevant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: