Intel Announces Xeon Phi Co-Processors

samlittlewood · on June 19, 2012

Note: the instruction set and open source software stack are available:

http://software.intel.com/en-us/blogs/2012/06/05/knights-cor...

http://software.intel.com/en-us/forums/showthread.php?t=1054...

ilaksh · on June 19, 2012

So this means that each core is running its own version of Red Hat or some other Linux? Does that mean that I can install my own software on those ah, tiny machines?

Someone may probably get mad or think I am stupid for asking this, but can I install node.js on each core?

steveb · on June 19, 2012

Each card (not core) is running an embedded version of Linux. Binaries are almost all linked to busybox. You can mount NFS shares to share data.

The embedded systems need specially compiled binaries. With the Intel compiler, this is usually achieved by turning on a flag at compile time. So yes, you can install and run your own software on the nodes.

ubernostrum · on June 19, 2012

Think of graphics cards, since they're the most commonplace example of coprocessors. You don't install a game on your graphics card; rather, the system gains the ability to offload graphics processing to the card, in order to improve how well the system as a whole runs the game.

reitzensteinm · on June 19, 2012

Except in this case, you can install Linux on it.

samlittlewood · on June 19, 2012

From my limited understanding: each card will appear as a 50 core linux box.

Each core has a set of 16 way vector insns, and there is a little api for synchronization.

raverbashing · on June 19, 2012

It's probably going to be a small RTOS in each core

Or like it's done in PS3, where you have one core scheduling the others

"but can I install node.js on each core?" maybe, but that's really not the idea

rbanffy · on June 19, 2012

You can go the RTOS route, but each card looks like a 50-core, 200-thread machine with 8 GB of GDDR5 memory and can comfortably run a Linux or *BSD stack.

In fact, I'd love to see developers using them as their personal machines (Intel would probably have to make ATX motherboards with it) so that more and more software makes use of the more and more available threads on our own machines. An entry-level laptop these days looks like 8 processors to the OS and this number will only go up.

ajross · on June 19, 2012

While these are x86 cores, and clearly capable of running desktop software, it wouldn't be a good experience as the primary CPU. The circuit size almost certainly means that these are in-order CPUs, probably with minimal superscalar capability, and likely a shallow pipeline. So scalar code is going to run terribly on them. Desktop usage is dominated by scalar CPU overhead.

rbanffy · on June 19, 2012

That's precisely why it would be interesting - developers would have to work around these limitations. Number of cores will continue to go up and, if the software allows it, they can be simples ones.

ajross · on June 19, 2012

With all due respect: yawn. People have been predicting the end of scalar code now for a decade, and it's no closer than it was. Show me a SIMD Javascript interpreter, or DOM layout engine, or SSL implementation, or compression algorithm, or heap allocator, or...

Won't happen. Out of order, heavily superscalar, deeply pipelined cores have won in the market because they're better for 90% of the problems you will see. But there are better architectures for the remaining 10% (and of those, "3D games", "Bitcoin mining" and "Password cracking" seem to make up about 90%), so you'll see lots of churn in this space as people try to figure out the best and most future-proof architeture.

For me, I like this a lot. The Knight's Corner architecture looks mostly identical to the AVX code you can run on Sandy/Ivy bridge right now, just with a few new features and a different SIMD width. From a programmer's perspective, it's much cleaner than the hidden-behind-multiply-abstracted-layers-above-proprietary-secret-hardware nonsense being peddled by AMD and NVIDIA.

But it won't run your desktop. Ever. Sorry.

rbanffy · on June 19, 2012

I agree it won't run current applications (as in Eclipse, Word, Excel, Emacs) much better than current CPUs do. But I can, for instance, imagine a huge number of applications that cannot run well on current architectures that would benefit from having a huge number of relatively simple cores.

My notebook currently has two kinds of cores - 2 amd64 ones and I don't know how many really dumb GPU cores. It would be great if my Gmail experience were snappier when I'm not dragging windows between monitors.

The first multi-processor machine (a 4-processor Pentium Pro) I used didn't impress me by being fast (it was, but not that much). It impressed me by being much smoother under load than my previous computer. It doesn't matter how sure you are it won't happen, it eventually will.

pjscott · on June 20, 2012

I agree with almost everything you said, but I take issue with a couple of your examples. First, depending on the cipher suite, SSL can in fact get significant speedups from SIMD. Second, while compression is a pretty serial task, if you compress in blocks, then you can parallelize across those blocks; I know that's not SIMD, but it also doesn't require fancy processor cores to take advantage of the parallelism. A lot of server tasks are similar to that: parallel enough that they could benefit from having a bunch of somewhat wimpy but power-efficient cores thrown at them.

ilaksh · on June 19, 2012

ok whats the idea then, how am I supposed to program the tiny machines/cores?

PopaL · on June 19, 2012

Have a look at OpenCL, I think you will use OpenCL or a similar framework to code for this device.

samlittlewood · on June 19, 2012

Have a look at http://ispc.github.com/ -

Intel SPMD Program Compiler

"An open-source compiler for high-performance SIMD programming on the CPU"

augustl · on June 19, 2012

Seems like the tl;dr is: 50+ x86 cores with RAM and CPU soldered to PCIe boards.

In other words, a mix of CPU and GPU. A standard architecture, as opposed to GPU private architectures, many cores, and faster memory pipelines.

Now, where did I put that functional programming book..

ken · on June 19, 2012

Hmm, the Mac Pro has Xeon processors and PCIe slots. People were complaining just recently about the lack of Mac Pro updates.

I assume some of Apple's professional video tools, at least, are CPU-bound. They'd probably love to have an "add 50 more processors" option for Mac Pros.

wmf · on June 19, 2012

The Phi cores are really slow unless you use the SIMD unit; the typical way to do that would be OpenCL, but you can run OpenCL code faster on your GPU.

nemo1618 · on June 19, 2012

Okay, I'm a little confused. What exactly is a Co-Prossessor? It looks like a graphics card. Will it replace the graphics cards or work alongside it? Does it only handle highly parallel tasks, with the processor handling serials tasks?

Symmetry · on June 19, 2012

http://en.wikipedia.org/wiki/Coprocessor

Yes, the idea is that you will offload highly parallel tasks to it while the main CPU handles serial tasks.

pjscott · on June 19, 2012

While we're adding relevant Wikipedia links,

http://en.wikipedia.org/wiki/Heterogeneous_computing

padobson · on June 19, 2012

If Xeon Phi is going to run linux out of the box, then that's Intel basically declaring that they want this card to be supported by the opensource community from day one, correct?

Correct me if I'm wrong, but it seems to me that the Xeon Phi is only going to be useful to open source developers early on, and then as libraries are built that make use of the Xeon Phi, application developers will be able to build on top of those libraries. So we're probably a long way off (and likely a few different shifts in co-processor technology) before a card like this is useful to the typical, end user.

wmf · on June 19, 2012

Realistically Intel is going to provide the libraries like the OpenCL runtime.

TheBoff · on June 19, 2012

Little low on technical detail. "Works synergistically with Intel(R) Xeon(R) Processor", honestly!

Interesting though: will it be developed into a graphics board, returning to Larabee? What will NVidia and AMD (ATI) do in response?

mtgx · on June 19, 2012

Nvidia has Project Denver in the making, based on ARMv8, the 64 bit architecture - custom CPU and Maxwell GPU.

ihowlatthemoon · on June 19, 2012

Looks like Intel is trying to attack the CUDA/Stream market. I'm curious to see how NVidia and ATI will respond.

koide · on June 19, 2012

s/ATI/AMD/

rbanffy · on June 19, 2012

Now, I'd love to see one with hardware-assisted transactional memory like the newest POWER machines. A standard HTM implementation for x64 ISA machines would be a huge leap forward.

spitfire · on June 19, 2012

The future "Haswell" chips are going to have support for transactional memory.

No news on these guys though. Give it time. In a few years these things will be getting up to speed in features/RAS and such.

nivertech · on June 19, 2012

Xeon works "synergistically" with Xeon Phi ;)

kruhft · on June 19, 2012

Any word on what the cost of this card will be?

seanp2k2 · on June 19, 2012

The 80s called. They'd like their co-processors back!

pjscott · on June 19, 2012

You know that GPUs are coprocessors, right? This is analogous. The cores on here are small, power-efficient, and have a lot of SIMD.

rbanffy · on June 19, 2012

I'm quite sure my Apple II didn't have 50+ cores.

hyperbovine · on June 19, 2012

That would be a funky looking apple...

on June 19, 2012

[deleted]

reitzensteinm · on June 19, 2012

Actually, this is something different.

Your link is an Intel chip in research that's being designed to mimic a human brain, built with memristors.

This one is about the commercial launch of Knight's Corner, an add on card by Intel which essentially stuffs 50 Pentium cores with enhanced vector instructions onto one chip.

This is the fruit of the troubled Larabee project, which aimed to build a GPU using x86, but even with Intel's extreme process node advantage, it could not get fast or power efficient enough to even come close to modern GPUs (it was several generations behind, and so was never released commercially).

Pretty cool couple of days for Intel, news wise.

mtgx · on June 19, 2012

Extreme process node advantage? They are just one generation ahead at most (2 years). It's not nearly enough to overcome the architecture's inefficiency when it comes to doing GPU stuff, nor is it big enough to overcome the architecture inefficiency for power consumption vs ARM chips for that matter.

Tuna-Fish · on June 19, 2012

By published data, Intel 45nm, with products out in 2007, switches as fast as TSMC 28nm, with products out January this year.

They only have ~2 years of advantage on density, but they reliably have 5+ years advantage on speed. I'd call that extreme.

I absolutely believe that once Intel actually releases atom cpus on their best process (so far, they have mostly used them to fill out production on old processes), it will beat ARM on performance/watt. When they move atom to a competitive uarch (OoO ffs), the race will stop being a race.

reitzensteinm · on June 19, 2012

That's indeed what I was referring to. Intel's processes are typically superior to AMD/GloFo's at the same node, and miles beyond the respective bulk processes at TSMC. They also get there first.

raverbashing · on June 19, 2012

Especially trying to be efficient with the bag of hurt that's x86

Now, x86 instructions are very efficient in some ways, but for raw performance, you're wasting about 30% of your silicon dealing with them

brigade · on June 19, 2012

Is it really 30% of your silicon even discounting caches? I'd love to see some analysis (and comparison against arm) of this from an expert; I've heard everything varying from "x86 instruction decoding is majority of the power drain" to "ISA doesn't matter, it's everything behind the frontend that matters" and it's hard to find actual analysis. I guess partly because this is all highly confidential.

And I know even ARM has instructions that make chip designers want to get all stabbity (ldm, stm)

DHowett · on June 19, 2012

This may be a previous discussion of Intel, but it is certainly not a discussion of the Xeon Phi coprocessor.