Hacker News new | past | comments | ask | show | jobs | submit login
Intel Announces Xeon Phi Co-Processors (anandtech.com)
76 points by mrb on June 19, 2012 | hide | past | favorite | 43 comments




So this means that each core is running its own version of Red Hat or some other Linux? Does that mean that I can install my own software on those ah, tiny machines?

Someone may probably get mad or think I am stupid for asking this, but can I install node.js on each core?


Each card (not core) is running an embedded version of Linux. Binaries are almost all linked to busybox. You can mount NFS shares to share data.

The embedded systems need specially compiled binaries. With the Intel compiler, this is usually achieved by turning on a flag at compile time. So yes, you can install and run your own software on the nodes.


Think of graphics cards, since they're the most commonplace example of coprocessors. You don't install a game on your graphics card; rather, the system gains the ability to offload graphics processing to the card, in order to improve how well the system as a whole runs the game.


Except in this case, you can install Linux on it.


From my limited understanding: each card will appear as a 50 core linux box.

Each core has a set of 16 way vector insns, and there is a little api for synchronization.


It's probably going to be a small RTOS in each core

Or like it's done in PS3, where you have one core scheduling the others

"but can I install node.js on each core?" maybe, but that's really not the idea


You can go the RTOS route, but each card looks like a 50-core, 200-thread machine with 8 GB of GDDR5 memory and can comfortably run a Linux or *BSD stack.

In fact, I'd love to see developers using them as their personal machines (Intel would probably have to make ATX motherboards with it) so that more and more software makes use of the more and more available threads on our own machines. An entry-level laptop these days looks like 8 processors to the OS and this number will only go up.


While these are x86 cores, and clearly capable of running desktop software, it wouldn't be a good experience as the primary CPU. The circuit size almost certainly means that these are in-order CPUs, probably with minimal superscalar capability, and likely a shallow pipeline. So scalar code is going to run terribly on them. Desktop usage is dominated by scalar CPU overhead.


That's precisely why it would be interesting - developers would have to work around these limitations. Number of cores will continue to go up and, if the software allows it, they can be simples ones.


With all due respect: yawn. People have been predicting the end of scalar code now for a decade, and it's no closer than it was. Show me a SIMD Javascript interpreter, or DOM layout engine, or SSL implementation, or compression algorithm, or heap allocator, or...

Won't happen. Out of order, heavily superscalar, deeply pipelined cores have won in the market because they're better for 90% of the problems you will see. But there are better architectures for the remaining 10% (and of those, "3D games", "Bitcoin mining" and "Password cracking" seem to make up about 90%), so you'll see lots of churn in this space as people try to figure out the best and most future-proof architeture.

For me, I like this a lot. The Knight's Corner architecture looks mostly identical to the AVX code you can run on Sandy/Ivy bridge right now, just with a few new features and a different SIMD width. From a programmer's perspective, it's much cleaner than the hidden-behind-multiply-abstracted-layers-above-proprietary-secret-hardware nonsense being peddled by AMD and NVIDIA.

But it won't run your desktop. Ever. Sorry.


I agree it won't run current applications (as in Eclipse, Word, Excel, Emacs) much better than current CPUs do. But I can, for instance, imagine a huge number of applications that cannot run well on current architectures that would benefit from having a huge number of relatively simple cores.

My notebook currently has two kinds of cores - 2 amd64 ones and I don't know how many really dumb GPU cores. It would be great if my Gmail experience were snappier when I'm not dragging windows between monitors.

The first multi-processor machine (a 4-processor Pentium Pro) I used didn't impress me by being fast (it was, but not that much). It impressed me by being much smoother under load than my previous computer. It doesn't matter how sure you are it won't happen, it eventually will.


I agree with almost everything you said, but I take issue with a couple of your examples. First, depending on the cipher suite, SSL can in fact get significant speedups from SIMD. Second, while compression is a pretty serial task, if you compress in blocks, then you can parallelize across those blocks; I know that's not SIMD, but it also doesn't require fancy processor cores to take advantage of the parallelism. A lot of server tasks are similar to that: parallel enough that they could benefit from having a bunch of somewhat wimpy but power-efficient cores thrown at them.


ok whats the idea then, how am I supposed to program the tiny machines/cores?


Have a look at OpenCL, I think you will use OpenCL or a similar framework to code for this device.


Have a look at http://ispc.github.com/ -

Intel SPMD Program Compiler

"An open-source compiler for high-performance SIMD programming on the CPU"


Seems like the tl;dr is: 50+ x86 cores with RAM and CPU soldered to PCIe boards.

In other words, a mix of CPU and GPU. A standard architecture, as opposed to GPU private architectures, many cores, and faster memory pipelines.

Now, where did I put that functional programming book..


Hmm, the Mac Pro has Xeon processors and PCIe slots. People were complaining just recently about the lack of Mac Pro updates.

I assume some of Apple's professional video tools, at least, are CPU-bound. They'd probably love to have an "add 50 more processors" option for Mac Pros.


The Phi cores are really slow unless you use the SIMD unit; the typical way to do that would be OpenCL, but you can run OpenCL code faster on your GPU.


Okay, I'm a little confused. What exactly is a Co-Prossessor? It looks like a graphics card. Will it replace the graphics cards or work alongside it? Does it only handle highly parallel tasks, with the processor handling serials tasks?


http://en.wikipedia.org/wiki/Coprocessor

Yes, the idea is that you will offload highly parallel tasks to it while the main CPU handles serial tasks.


While we're adding relevant Wikipedia links,

http://en.wikipedia.org/wiki/Heterogeneous_computing


If Xeon Phi is going to run linux out of the box, then that's Intel basically declaring that they want this card to be supported by the opensource community from day one, correct?

Correct me if I'm wrong, but it seems to me that the Xeon Phi is only going to be useful to open source developers early on, and then as libraries are built that make use of the Xeon Phi, application developers will be able to build on top of those libraries. So we're probably a long way off (and likely a few different shifts in co-processor technology) before a card like this is useful to the typical, end user.


Realistically Intel is going to provide the libraries like the OpenCL runtime.


Little low on technical detail. "Works synergistically with Intel(R) Xeon(R) Processor", honestly!

Interesting though: will it be developed into a graphics board, returning to Larabee? What will NVidia and AMD (ATI) do in response?


Nvidia has Project Denver in the making, based on ARMv8, the 64 bit architecture - custom CPU and Maxwell GPU.


Looks like Intel is trying to attack the CUDA/Stream market. I'm curious to see how NVidia and ATI will respond.


s/ATI/AMD/


Now, I'd love to see one with hardware-assisted transactional memory like the newest POWER machines. A standard HTM implementation for x64 ISA machines would be a huge leap forward.


The future "Haswell" chips are going to have support for transactional memory.

No news on these guys though. Give it time. In a few years these things will be getting up to speed in features/RAS and such.


Xeon works "synergistically" with Xeon Phi ;)


Any word on what the cost of this card will be?


The 80s called. They'd like their co-processors back!


You know that GPUs are coprocessors, right? This is analogous. The cores on here are small, power-efficient, and have a lot of SIMD.


I'm quite sure my Apple II didn't have 50+ cores.


That would be a funky looking apple...


[deleted]


Actually, this is something different.

Your link is an Intel chip in research that's being designed to mimic a human brain, built with memristors.

This one is about the commercial launch of Knight's Corner, an add on card by Intel which essentially stuffs 50 Pentium cores with enhanced vector instructions onto one chip.

This is the fruit of the troubled Larabee project, which aimed to build a GPU using x86, but even with Intel's extreme process node advantage, it could not get fast or power efficient enough to even come close to modern GPUs (it was several generations behind, and so was never released commercially).

Pretty cool couple of days for Intel, news wise.


Extreme process node advantage? They are just one generation ahead at most (2 years). It's not nearly enough to overcome the architecture's inefficiency when it comes to doing GPU stuff, nor is it big enough to overcome the architecture inefficiency for power consumption vs ARM chips for that matter.


By published data, Intel 45nm, with products out in 2007, switches as fast as TSMC 28nm, with products out January this year.

They only have ~2 years of advantage on density, but they reliably have 5+ years advantage on speed. I'd call that extreme.

I absolutely believe that once Intel actually releases atom cpus on their best process (so far, they have mostly used them to fill out production on old processes), it will beat ARM on performance/watt. When they move atom to a competitive uarch (OoO ffs), the race will stop being a race.


That's indeed what I was referring to. Intel's processes are typically superior to AMD/GloFo's at the same node, and miles beyond the respective bulk processes at TSMC. They also get there first.


Especially trying to be efficient with the bag of hurt that's x86

Now, x86 instructions are very efficient in some ways, but for raw performance, you're wasting about 30% of your silicon dealing with them


Is it really 30% of your silicon even discounting caches? I'd love to see some analysis (and comparison against arm) of this from an expert; I've heard everything varying from "x86 instruction decoding is majority of the power drain" to "ISA doesn't matter, it's everything behind the frontend that matters" and it's hard to find actual analysis. I guess partly because this is all highly confidential.

And I know even ARM has instructions that make chip designers want to get all stabbity (ldm, stm)


This may be a previous discussion of Intel, but it is certainly not a discussion of the Xeon Phi coprocessor.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: