The target is not static; it is dynamically updated as part of history.
Prefetch chaining is to get code out of DRAM, and it runs DRAM-latency ahead of execution. Separately, fetch chaining is to get the code up to the L1, and it runs L2/L3-latency ahead of execution. Load chaining gets the lines from the L1 to the L0 micro-caches and the decoders, and runs decode-latency (3 cycles on the Mill) ahead of execution.
The Mill stages instructions like this because the further ahead in time an action is the more likely that the action is down a path that execution will not take. We don't prefetch to the L1 because we don't have enough confidence that we will need the code to be willing to spam a scarce resource like the L1. But avoiding a full DRAM hit is important too, so we stage the fetching. It doesn't matter at all in small codes that spend all their time in a five-line loop, but that's not the only kind of codes there are :-)
Prefetch chaining is to get code out of DRAM, and it runs DRAM-latency ahead of execution. Separately, fetch chaining is to get the code up to the L1, and it runs L2/L3-latency ahead of execution. Load chaining gets the lines from the L1 to the L0 micro-caches and the decoders, and runs decode-latency (3 cycles on the Mill) ahead of execution.
The Mill stages instructions like this because the further ahead in time an action is the more likely that the action is down a path that execution will not take. We don't prefetch to the L1 because we don't have enough confidence that we will need the code to be willing to spam a scarce resource like the L1. But avoiding a full DRAM hit is important too, so we stage the fetching. It doesn't matter at all in small codes that spend all their time in a five-line loop, but that's not the only kind of codes there are :-)