So this means that each core is running its own version of Red Hat or some other Linux? Does that mean that I can install my own software on those ah, tiny machines?
Someone may probably get mad or think I am stupid for asking this, but can I install node.js on each core?
Each card (not core) is running an embedded version of Linux. Binaries are almost all linked to busybox. You can mount NFS shares to share data.
The embedded systems need specially compiled binaries. With the Intel compiler, this is usually achieved by turning on a flag at compile time. So yes, you can install and run your own software on the nodes.
Think of graphics cards, since they're the most commonplace example of coprocessors. You don't install a game on your graphics card; rather, the system gains the ability to offload graphics processing to the card, in order to improve how well the system as a whole runs the game.
You can go the RTOS route, but each card looks like a 50-core, 200-thread machine with 8 GB of GDDR5 memory and can comfortably run a Linux or *BSD stack.
In fact, I'd love to see developers using them as their personal machines (Intel would probably have to make ATX motherboards with it) so that more and more software makes use of the more and more available threads on our own machines. An entry-level laptop these days looks like 8 processors to the OS and this number will only go up.
While these are x86 cores, and clearly capable of running desktop software, it wouldn't be a good experience as the primary CPU. The circuit size almost certainly means that these are in-order CPUs, probably with minimal superscalar capability, and likely a shallow pipeline. So scalar code is going to run terribly on them. Desktop usage is dominated by scalar CPU overhead.
That's precisely why it would be interesting - developers would have to work around these limitations. Number of cores will continue to go up and, if the software allows it, they can be simples ones.
With all due respect: yawn. People have been predicting the end of scalar code now for a decade, and it's no closer than it was. Show me a SIMD Javascript interpreter, or DOM layout engine, or SSL implementation, or compression algorithm, or heap allocator, or...
Won't happen. Out of order, heavily superscalar, deeply pipelined cores have won in the market because they're better for 90% of the problems you will see. But there are better architectures for the remaining 10% (and of those, "3D games", "Bitcoin mining" and "Password cracking" seem to make up about 90%), so you'll see lots of churn in this space as people try to figure out the best and most future-proof architeture.
For me, I like this a lot. The Knight's Corner architecture looks mostly identical to the AVX code you can run on Sandy/Ivy bridge right now, just with a few new features and a different SIMD width. From a programmer's perspective, it's much cleaner than the hidden-behind-multiply-abstracted-layers-above-proprietary-secret-hardware nonsense being peddled by AMD and NVIDIA.
I agree it won't run current applications (as in Eclipse, Word, Excel, Emacs) much better than current CPUs do. But I can, for instance, imagine a huge number of applications that cannot run well on current architectures that would benefit from having a huge number of relatively simple cores.
My notebook currently has two kinds of cores - 2 amd64 ones and I don't know how many really dumb GPU cores. It would be great if my Gmail experience were snappier when I'm not dragging windows between monitors.
The first multi-processor machine (a 4-processor Pentium Pro) I used didn't impress me by being fast (it was, but not that much). It impressed me by being much smoother under load than my previous computer. It doesn't matter how sure you are it won't happen, it eventually will.
I agree with almost everything you said, but I take issue with a couple of your examples. First, depending on the cipher suite, SSL can in fact get significant speedups from SIMD. Second, while compression is a pretty serial task, if you compress in blocks, then you can parallelize across those blocks; I know that's not SIMD, but it also doesn't require fancy processor cores to take advantage of the parallelism. A lot of server tasks are similar to that: parallel enough that they could benefit from having a bunch of somewhat wimpy but power-efficient cores thrown at them.
Hmm, the Mac Pro has Xeon processors and PCIe slots. People were complaining just recently about the lack of Mac Pro updates.
I assume some of Apple's professional video tools, at least, are CPU-bound. They'd probably love to have an "add 50 more processors" option for Mac Pros.
The Phi cores are really slow unless you use the SIMD unit; the typical way to do that would be OpenCL, but you can run OpenCL code faster on your GPU.
Okay, I'm a little confused. What exactly is a Co-Prossessor? It looks like a graphics card. Will it replace the graphics cards or work alongside it? Does it only handle highly parallel tasks, with the processor handling serials tasks?
If Xeon Phi is going to run linux out of the box, then that's Intel basically declaring that they want this card to be supported by the opensource community from day one, correct?
Correct me if I'm wrong, but it seems to me that the Xeon Phi is only going to be useful to open source developers early on, and then as libraries are built that make use of the Xeon Phi, application developers will be able to build on top of those libraries. So we're probably a long way off (and likely a few different shifts in co-processor technology) before a card like this is useful to the typical, end user.
Now, I'd love to see one with hardware-assisted transactional memory like the newest POWER machines. A standard HTM implementation for x64 ISA machines would be a huge leap forward.
Your link is an Intel chip in research that's being designed to mimic a human brain, built with memristors.
This one is about the commercial launch of Knight's Corner, an add on card by Intel which essentially stuffs 50 Pentium cores with enhanced vector instructions onto one chip.
This is the fruit of the troubled Larabee project, which aimed to build a GPU using x86, but even with Intel's extreme process node advantage, it could not get fast or power efficient enough to even come close to modern GPUs (it was several generations behind, and so was never released commercially).
Extreme process node advantage? They are just one generation ahead at most (2 years). It's not nearly enough to overcome the architecture's inefficiency when it comes to doing GPU stuff, nor is it big enough to overcome the architecture inefficiency for power consumption vs ARM chips for that matter.
By published data, Intel 45nm, with products out in 2007, switches as fast as TSMC 28nm, with products out January this year.
They only have ~2 years of advantage on density, but they reliably have 5+ years advantage on speed. I'd call that extreme.
I absolutely believe that once Intel actually releases atom cpus on their best process (so far, they have mostly used them to fill out production on old processes), it will beat ARM on performance/watt. When they move atom to a competitive uarch (OoO ffs), the race will stop being a race.
That's indeed what I was referring to. Intel's processes are typically superior to AMD/GloFo's at the same node, and miles beyond the respective bulk processes at TSMC. They also get there first.
Is it really 30% of your silicon even discounting caches? I'd love to see some analysis (and comparison against arm) of this from an expert; I've heard everything varying from "x86 instruction decoding is majority of the power drain" to "ISA doesn't matter, it's everything behind the frontend that matters" and it's hard to find actual analysis. I guess partly because this is all highly confidential.
And I know even ARM has instructions that make chip designers want to get all stabbity (ldm, stm)
http://software.intel.com/en-us/blogs/2012/06/05/knights-cor...
http://software.intel.com/en-us/forums/showthread.php?t=1054...