> It has to hold significantly more information than a BTB if Mill wants to replace all the other associated branch prediction machinery with a table that needs to predict every target.
The exit table and the prediction cache that they've described are only intended to be the common subset of prediction mechanisms that differ from a typical CPU, while a high-end Mill could also be incorporating many of the tricks you would expect on a desktop-class CPU, but those details were beyond the scope of the prediction talk.
High-end x86 CPUs already have pretty complex BTBs, often multi-level caches with different capacity for different branch types and limits on how many branches per I$ line. The only extra thing the Mill is adding with its prediction cache is the length information on how many bytes of instructions to prefetch and decode to get to the expected branch point (whereupon the prediction cache will be consulted a second time to start on the next EBB). As with x86 cores, I don't think the prediction cache/BTB would always be the same buffer or same size as the history pattern table or other model-specific prediction mechanisms.
The exit table is further from the execution pipeline and isn't on the critical path for recovering from a mispredict. It gets feedback from the core but only directly feeds into the prefetcher. This is the cache that will have the save/restore mechanisms and will load data from RAM in batches. It sounds a lot like a TLB in these respects, though the context switch that triggers loading new data is a function call rather than only a process switch. It definitely doesn't need to predict everything. In the talk it was mentioned that the exit table would have victim buffers and has a 3-cycle latency to the prefetcher. (And remember that Mill clock cycles will probably be pretty long in initial implementations.)
The exit table and the prediction cache that they've described are only intended to be the common subset of prediction mechanisms that differ from a typical CPU, while a high-end Mill could also be incorporating many of the tricks you would expect on a desktop-class CPU, but those details were beyond the scope of the prediction talk.
High-end x86 CPUs already have pretty complex BTBs, often multi-level caches with different capacity for different branch types and limits on how many branches per I$ line. The only extra thing the Mill is adding with its prediction cache is the length information on how many bytes of instructions to prefetch and decode to get to the expected branch point (whereupon the prediction cache will be consulted a second time to start on the next EBB). As with x86 cores, I don't think the prediction cache/BTB would always be the same buffer or same size as the history pattern table or other model-specific prediction mechanisms.
The exit table is further from the execution pipeline and isn't on the critical path for recovering from a mispredict. It gets feedback from the core but only directly feeds into the prefetcher. This is the cache that will have the save/restore mechanisms and will load data from RAM in batches. It sounds a lot like a TLB in these respects, though the context switch that triggers loading new data is a function call rather than only a process switch. It definitely doesn't need to predict everything. In the talk it was mentioned that the exit table would have victim buffers and has a 3-cycle latency to the prefetcher. (And remember that Mill clock cycles will probably be pretty long in initial implementations.)