If true, that'd be a very high number. Intel can get 2 on some benchmarks, but system-average IPC tends towards 1. (In terms of instructions standardized for cross-platform comparison. I take data on ARM servers and I'm jellous of 1.)
Are any Intel engineers on this thread saying, whelp if they can do 6 we may as well give up? Or does the lack of any data help you sleep soundly and skeptically?
Sure. In theory Skylake can deliver ~8uOPs per cycle in parallel.
Do they in real code?
No.
The Mill team claims (with no running general purpose code OR data of any sort) that they can. YMMV, but I bet Intel sleeps safe. Which sucks because I actually want innovation in computer architecture, which is a notoriously uninnovative industry, and the Mill has some genuinely good ideas, but I don't think they'll work here.
ILP and IPC are not the same thing. ILP (instruction-level parallelism) is the number of instructions executed in a non-stalled clock cycle. IPC (instructions per clock) is the average number of instructions executed per clock, including clock cycles spent waiting for main memory and recovering from branch mis-prediction.
Decades of optimization on a crap architecture. The Mill has been able to learn from the mistakes made in every architecture to date. The kind of comparison you make doesn't mean anything as a result.
Are any Intel engineers on this thread saying, whelp if they can do 6 we may as well give up? Or does the lack of any data help you sleep soundly and skeptically?