Hacker News new | past | comments | ask | show | jobs | submit login

If true, that'd be a very high number. Intel can get 2 on some benchmarks, but system-average IPC tends towards 1. (In terms of instructions standardized for cross-platform comparison. I take data on ARM servers and I'm jellous of 1.)

Are any Intel engineers on this thread saying, whelp if they can do 6 we may as well give up? Or does the lack of any data help you sleep soundly and skeptically?

Sure. In theory Skylake can deliver ~8uOPs per cycle in parallel.

Do they in real code?


The Mill team claims (with no running general purpose code OR data of any sort) that they can. YMMV, but I bet Intel sleeps safe. Which sucks because I actually want innovation in computer architecture, which is a notoriously uninnovative industry, and the Mill has some genuinely good ideas, but I don't think they'll work here.

> In theory Skylake can deliver ~8uOPs per cycle in parallel.

4 µop/cycle after fusion.


ILP and IPC are not the same thing. ILP (instruction-level parallelism) is the number of instructions executed in a non-stalled clock cycle. IPC (instructions per clock) is the average number of instructions executed per clock, including clock cycles spent waiting for main memory and recovering from branch mis-prediction.

See slide 44:


It is very dependent on the workload.


Decades of optimization on a crap architecture. The Mill has been able to learn from the mistakes made in every architecture to date. The kind of comparison you make doesn't mean anything as a result.

Intel has also learned from each generation.

CISC is very efficient. They do the most ops per transistor.

But Intel is constrained by backwards compatibility.

"CISC is very efficient" is a meaningless statement. The specific instruction set and its implementation is or is not efficient.

Disclaimer: Opinions are solely my own and do not reflect my employer's

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
