A Cheap 12-Core, 30-Watt Ubuntu Cluster

fiatmoney · on June 7, 2012

For comparison, you can now get < 35-watt Intel Ivy Bridge processors, with 2 or 4 physical cores. Would not be surprised to see them outperform on a performance / watt or performance / $ basis, especially if you're running OpenCL accelerated computations on the GPU.

http://ark.intel.com/products/65703/Intel-Core-i5-3470T-Proc...

http://ark.intel.com/products/65735

http://ark.intel.com/products/65714/Intel-Core-i7-3517U-Proc...

madmaze · on June 7, 2012

the wattage measured here is not just the processor, its also the entire boards.

Ive done some benchmarks testing Intel Atom processors Atom330's and D525's against Tegra3 processors. On average the Terga3 processors outperformed the Atoms 4x-8x in Flop/s per Watt.

Admittedly that is not the only important metric, but gives an initial performance comparison between ARM and x86 performance

DeepDuh · on June 7, 2012

That being said, the Atoms themselves are nor a good benchmark in performance / Watt. I'd rather be interested in a comparison vs. current gen. xeon or interlagos systems. Sounds silly, but ARM has been making progress and I could see them being used in the future as companions to GPUs in computing clusters. With current GPGPU computing models like OpenACC it does not really make sense to put 16 race horses (Interlagos) besides an ant colony (Fermi GPU), except if you head for high flexibility.

mtgx · on June 8, 2012

Well, I think Intel wants Atom to compete with ARM for that market, so it may be relevant.

There's also Nvidia's Project Denver, which will probably come out in 2014. It's based on the 64 bit ARMv8 architecture, it's a custom CPU made in collaboration with ARM, and I think they want to pair it with their next-gen GPU architecture Maxwell. It's intended for servers and supercomputers.

DeepDuh · on June 9, 2012

I've heard about that Nvidia project. I think they are on the right track. The only thing that's missing is enough programmers (und thus software) for this model. As an example (and I'm saying this as a layman in terms of databases) I think that DBMS might be able to profit a lot from the GPGPU based model. For high read traffic databases you could scale a system with n GPUs, based on how much "storage" the database needs (the storage being the GPU ram, continuously mirrored to harddisks when writes occur).

zokier · on June 8, 2012

I think the consumption of the rest of the system is fairly negligible, especially considering how inaccurate the TDP figures are to begin with. Eg this[1] system with a 95W TDP CPU draws 92 watts from the wall. And that is with off-the-shelf parts. Swap in a low-power CPU and overall optimize for power consumption and I wouldn't be surprised if you'd get below 50W.

mtgx · on June 7, 2012

You're giving Intel graphics as a pro for servers using OpenCL? Either way, I don't know about this set-up, but next year a set-up with Cortex A15 and Mali T658 will have shared cache between CPU and GPU, which should be a lot more efficient than anything Intel or even AMD has today regarding GPU compute. From what I understand cache "latency" between CPU and GPU is a pretty big problem in the desktop space, and this ARM concept should make it a lot better.

http://www.cnx-software.com/wp-content/uploads/2011/11/arm_s...

The "Little CPU" is supposed to be the lower-power Cortex A7. Canonical is very aggressive about supporting ARM chips.

bonzoesc · on June 7, 2012

> Cortex A15 and Mali T658 will have shared cache between CPU and GPU, which should be a lot more efficient than anything Intel or even AMD has today regarding GPU compute.

Intel's had cache (last level cache, LLC in the diagram below) since January, 2011:

http://images.anandtech.com/reviews/cpu/intel/sandybridge/ar...

sliverstorm · on June 7, 2012

I'm probably just missing something, but can you explain how the concept is going to fix cache latency CPU <-> GPU?

pyre · on June 7, 2012

I'm assuming this:

  Cortex A15 and Mali T658 will have shared cache
  between CPU and GPU

is what you're looking for.

Symmetry · on June 7, 2012

I thought Intel next architecture, Haswell, was going to have the same feature?

lucian1900 · on June 7, 2012

No (at least to the same degree), but AMD's next architecture will, and in a much more interesting form.

ldng · on June 7, 2012

Ok, but then maybe you'd want a small and cheap cluster gather knowledge on cluster deployment and management. I am sure interested in the type of stuff. I guess you will hit scale problem sooner on an underpowered motherboard. What do you think ? Oh, and as already said by other, wattage is for the whole board. That said I am curious about how they will actually power this cluster.

ilaksh · on June 7, 2012

Ah, I like the general idea, but why bother with a cluster of ARM for 12 cores? You can get a 12 core AMD in one machine.

Or if you want really low power usage, there is this http://www.tilera.com/sites/default/files/productbriefs/Tile... with 36 cores on one chip (of course that's not especially cheap).

This doesn't seem like a very useful example.

mtgx · on June 8, 2012

That AMD chip would have a lot higher power consumption, though, probably 10x as much.

ericmoritz · on June 8, 2012

yeah this 12 core cluster costs about $2,000 for 12 Panda boards.

mtgx · on June 8, 2012

I may be wrong, but I think the PandBoards are not really meant to be put in servers (they are meant as development boards, right?), and they are significantly more expensive than if you'd buy a 12 core ARM cluster specifically meant to be used as a server.

zokier · on June 8, 2012

I can't imagine that the 96 core cluster would be either power efficient or cost efficient compared to Intel offerings. Assuming 3 watts and $100 per core (which align fairly well to the figures presented in the article), 96 core cluster would draw nearly 300 watts and cost nearly $10000.

Contrast that to eg Intel E5-2650L[1]: 8 cores, 16 threads, 70 watts and $1200 price. Obviously the whole system cost would be much higher, but I find it hard to believe that you couldn't get/build two 2-socket systems with such CPU for $10000. The CPUs alone would draw 280 watts, which would put the whole system power consumption in the same ballpark as the ARM cluster.

Remaining question would be performance. ARM cluster would have 50% more (logical) cores, but I'd imagine that Intel would have far higher per-thread performance. And for lots of tasks the 100M Ethernet of the Beagleboard as an interconnect could become a bottleneck.

[1]http://ark.intel.com/products/64585/Intel-Xeon-Processor-E5-...

eupharis · on June 8, 2012

Finally a Hackernews post that inspires poetry:

  Snoozeworthy mobos
  But wooden dish drying rack?
  Stroke of pure genius!

iSloth · on June 7, 2012

Would be interesting to know what application it's actually going to be running? what task it's going to be doing?

sp332 · on June 7, 2012

It's for Phoronix to test new ARM-based software products, mainly benchmarking. The second paragraph:

This cluster will be used for delivering some interesting ARM Linux benchmarks, of course! It's also being used for Phoronix Test Suite purposes for creating more MPI/cluster benchmarks for some of the commercial/enterprise clients, coming up with more ARM Linux benchmarks in general, and this hardware is also going to be part of a much larger (~96+ ARM core cluster - details to be shared at a later time) cluster.

jackalope · on June 7, 2012

I like the idea of independently replaceable boards, but the drying rack isn't the answer. It may be great for heat dissipation, but you'd be surprised how many dangers are present in an apparently empty room (dust, ceiling leaks, your hipster keychain, etc.).

impoverished · on June 8, 2012

Why isn't there a market (equal to the one for boards like these) for enclosures, "home racks" or something similar to house these things? Or is there a market but this guy just wanted to save a few quid?

wtracy · on June 7, 2012

Wow, my employer sucks and is blocking Phoronix. Anyone want to summarize this until I can check it from home?

sp332 · on June 7, 2012

Six dual-core PandaBoard ES boards stuck in a wooden dish-drying rack :) One of these draws a max of 5-6 watts under heavy load, so the whole "rack" should stay around 30 watts during cluster benchmarking etc. It's not completely assembled yet so there are no real numbers. Total cost about $1500.

wtracy · on June 7, 2012

I was seriously considering trying something like this with either the Raspberry Pi or the APC <http://apc.io/>.

It looks like one PandaBoard can roughly match the CPU horsepower of four APC boards at a slightly lower price, so that's good to know. The Raspberry Pi might still beat out the PandaBoard in terms of purchase cost-to-Megahertz ratio. We'll see. :-)

punit · on June 8, 2012

Don't just look at MHz/$. The Raspberry Pi has a ARM v6 compatible core vs the PandaBoard which has a v7-A core, i.e., Raspberry Pi MHz < PandaBoard MHz.

eupharis · on June 8, 2012

Do you know of anyplace that quantifies the actual performance difference of the different ARM versions? Just want to see the general improvement jump over time

brettnak · on June 7, 2012

http://webcache.googleusercontent.com/search?q=cache:qF1lx1w...

That might get you around the blocking.

whalesalad · on June 7, 2012

I think that the site is just down. I can't get it either.

kristianp · on June 7, 2012

Might be handy for a cheap Memcached/Redis cluster. I wonder if there are boards with the full 4gig of RAM? Probably 1 or 10 gig network would be better for that, though.

IsTom · on June 7, 2012

100 Mb Ethernet doesn't sound like much. How performant are these ARM processors compared to x86-64? Are they 32 or 64 bit? What's the type or RAM?