Hacker News new | past | comments | ask | show | jobs | submit login

We should distinguish between designing for a benchmark and designing for a set of workloads. Everyone choices representative workloads they care about and evaluate design choices on a variety of metrics from simulating execution of parts of those workloads.

Linpack is a common go-to number because, for all the flaws, it's a widely quoted number. E.g. used in the top500 ranking. It tends to let the cpu crank away and not stress the interconnect, and is widely viewed as an upper bound on perf for the machine. In the E5 case it'll be particularly helped by the move to AVX enabled cores, and take more advantage of that than general workloads. Realistic hpc workloads stress a lot more of the machine beyond the cpu. Interconnect performance in particular.

People like to dump on x86 but it's not that bad. There are plenty of features no one really uses and we still have around, but those features will often end up being microcoded and not gunking up the rest of the core. The big issue is decoder power and performance. x86 decode is complex. On the flipside, the code density is pretty good and that is important. Secondly, Intel and others, have added various improvements that help avoid the downsides. E.g. caching of decode, post-decode loop buffers and uop caches etc. Plus the new ISA extensions are much kinder..




The problem with x86 is when you scale the chips to N cores you have N copy's of all that dead weight. You might not save many transistors by say dropping support for 16 bit floats relative to how much people would hate you for doing so. However, there are plenty of things you can drop from a GPU or vector processor and when you start having 100's of them it's a real issue.

Still with enough of a process advantage and enough manpower you can end with something like the i7 2600 which has a near useless GPU and a ridiculous pin count and still dominates all competition in it's price range.


Is there a cost? Of course. But arguably it's in the noise on these chips. Knights Ferry and Corner are using a scalar x86 core derived from the P54C. How many transistors was that? About 3.3 million. By contrast, Nvidia's 16-core Fermi is a 3 billion transistor design. (No, Fermi doesn't have 512 cores, that's a marketing number based on declaring that a SIMD lane is a "cuda core", if we do the same trick with MIC we start doing 50+ cores * 16 wide and claiming 800 cores).

How can we resolve this dissonance? Easy -- ignoring the fixed function and graphics only parts of Fermi, most of the transistors are going to be in the caches, the floating point units and the interconnect. These are places MIC will also spend billions of transistors but they're not carrying legacy dead weight from x86 history -- the FPU is 16 wide by definition must have a new ISA. The cost of the scalar cores will not be remotely dominant.

I'm not sure why you are concerned about the pin count on the processor, except perhaps if you are complaining about changing socket designs which is a different argument. The i7 2600 would fit in a LGA 1155 (i.e. 1155 pins) whereas Fermi was using a 1981 pin design on the compute SKUs. The sandy bridge CPU design is a fine one. The GPU is rapidly improving (e.g. ivy bridge should be significantly better, and will be a 1.4 billion transistor design in the same 22nm as Knights Corner).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: