While the performance seen here is nice, i'm curious to see the price/performanc...

felixgallo · on Oct 22, 2014

the thing you're getting here is primarily throughput on a single image. Even if it's more expensive per-core per-hour, you can't discount that you'd have to work a lot harder to get the equivalent 30-box distributed solution to work properly, and even then it would have certain disadvantages owing to network latency.

nonsequ · on Oct 22, 2014

This is interesting and I'd like to hear more opinions on it. My impression is that distributed computing has been eating Power/Sparc/Z processors' lunch for a long time now because software has made up for the deficiencies of coordinating 30 boxes. Do you and do any others believe that we are at an inflection point where the pendulum swings back in the direction of 'high-performance' processors like Power8, or will improvements in 'scale-out' ease-of-use and economies of scale continue to win the day?

felixgallo · on Oct 22, 2014

The dominant use case for the last decade or so has been web servers hitting caches to do low-CPU low-causality CRUD operations. That looks unlikely to change in the next decade, so keep your Intel stock.

That said, for a lot of interesting use cases, like that king-hell postgres database sitting in the middle of the swarm, or video processing, or streams processing, or indeed any situation in which thousands-to-millions of simultaneous actors need to work on the same shared state, this sort of system starts looking real interesting.

As a thought experiment, think of this system like a GPU, except every single processor is a fully capable 2 GHz i5 running Unix, and instead of having to deal with the CUDA or OpenCL API, you can just write erlang (or haskell; .. or whatever) code and it will run. And instead of having 2-8G of RAM, you have 48G. And instead of having arcane debug tools, you have recon and gdb and ddd.

I don't think there is a pendulum, I think there's a spectrum and has always been one; pragmatism should always rule, and your use case is not my use case. There isn't going to be an objective winner ever, no matter how close Intel may get to covering much of the sweet spot.

jacques_chester · on Oct 22, 2014

If you have a problem that behaves poorly in the face of the Network Fallacies, then you want to scale vertically.

RDBMSes are a classic example. Some kinds of compute-heavy problems too -- simulations with lots of coupled components, video compression etc.

neurotixz · on Oct 22, 2014

I would be extremely surprised that you would need 30 x86 boxes to reach the performance of a P8 box, on any type workload. By my experience with P5-P7 they can be faster for certain workloads then x86, but not that much.

apaprocki · on Oct 22, 2014

You can't really compare the different chip revs apples-to-apples. P6 was a completely different chip architecture with much higher clock speeds that IBM abandoned because it didn't perform well. They make a lot of changes in each chip rev.