If true, that'd be a very high number. Intel can get 2 on some benchmarks, but s...

deepnotderp · on July 19, 2017

Sure. In theory Skylake can deliver ~8uOPs per cycle in parallel.

Do they in real code?

No.

The Mill team claims (with no running general purpose code OR data of any sort) that they can. YMMV, but I bet Intel sleeps safe. Which sucks because I actually want innovation in computer architecture, which is a notoriously uninnovative industry, and the Mill has some genuinely good ideas, but I don't think they'll work here.

Veedrac · on July 19, 2017

> In theory Skylake can deliver ~8uOPs per cycle in parallel.

4 µop/cycle after fusion.

http://www.agner.org/optimize/blog/read.php?i=650

jimrandomh · on July 19, 2017

ILP and IPC are not the same thing. ILP (instruction-level parallelism) is the number of instructions executed in a non-stalled clock cycle. IPC (instructions per clock) is the average number of instructions executed per clock, including clock cycles spent waiting for main memory and recovering from branch mis-prediction.

phkahler · on July 19, 2017

See slide 44:

https://riscv.org/wp-content/uploads/2016/01/Wed1345-RISCV-W...

It is very dependent on the workload.

on July 19, 2017

[deleted]

naasking · on July 19, 2017

Decades of optimization on a crap architecture. The Mill has been able to learn from the mistakes made in every architecture to date. The kind of comparison you make doesn't mean anything as a result.

mozumder · on July 20, 2017

Intel has also learned from each generation.

CISC is very efficient. They do the most ops per transistor.

naasking · on July 20, 2017

But Intel is constrained by backwards compatibility.

"CISC is very efficient" is a meaningless statement. The specific instruction set and its implementation is or is not efficient.

dbancajas · on July 19, 2017

Disclaimer: Opinions are solely my own and do not reflect my employer's