Qualcomm Demos 48-Core Centriq 2400 Server SoC in Action, Begins Sampling

baybal2 · on Dec 17, 2016

I remember from my work in an ad startup revjet.

On one meeting we had a typical discussion with ops guys:

- "why wouldn't we optimise our hardware utilisation by doing things a, b, and c."

- "hardware is crap cheap these days. If you need more capacity, just throw more servers at that"

- "is $24k a month in new servers crap cheap by your measure?"

- "comparatively to the amount of how much money these servers will make the same month, it is crap cheap. It is just a little less than an annual cost of mid-tier software dev in Russian office. We account only 12% increase in our revenue due to algorithmic improvements and almost 80 to more traffic we handle. A new server pays back the same month, and you and other devs pay off only in 2 years"

This pretty much summarises the viewpoint of a typical "big dot com ops manager" on hardware these days

nl · on Dec 17, 2016

This is entirely true.

However, there's a class of companies which don't buy servers individually but by the datacenter (or large parts of one).

If your company is building custom ASIC parts, FPGAs or one of the main customers of an entire chip architecture (eg Power9) then the math is different.

late2part · on Dec 17, 2016

This is somewhat true. However, raising gross margin by cutting depreciation is a good thing. And throw in "green" for karma points. Using hardware that takes less space and power with more performance will get most good 'big dot com ops managers' happier.

kikoreis · on Dec 17, 2016

Except a new architecture brings with it non-trivial software compat and operational overhead. If you are 100% in control of your software the former is manageable, but how many of us really are?

The trick here is to make the new platform attractive enough it survives the stupid test, i.e. I'd be stupid not to try this out! That implies either very cheap or very fast or.. both. I presented on this at Linaro Connect USA in 2014: https://www.slideshare.net/mobile/linaroorg/leg-keynotekiko-...

setra · on Dec 17, 2016

This only applies at the small scale. As software is run on many machines the cost of increasing performance easily pays for itself.

swiley · on Dec 17, 2016

I wonder if people will actually get documentation for these, unlike many of their other devices. I'd hate to see server kerenels end up as much a wreck as phone kernels.

rwmj · on Dec 17, 2016

I can't really go into details because of NDAs (oh the irony) but these machines will boot fully open source and upstream kernels out of the box. We are very much trying to avoid the problems encountered with phone-grade ARM chips.

userbinator · on Dec 17, 2016

Will there be datasheets and other programming documentation available? ARM core documentation is, but everything else on the SoC is also important. Just because there is open-source code for it doesn't mean it's been documented, because the code only shows "how" and not "why".

Qualcomm is notoriously closed with docs, about the same as Broadcom. Intel is starting to close up, but is still more open because of the x86/PC legacy.

rwmj · on Dec 17, 2016

I don't know (and if I did know, I couldn't say at present).

However the points raised in this subthread are absolutely correct. Having upstream kernel support is vital so that any distro boots on any h/w and so that security fixes can be applied. Having good quality drivers is important, and documentation very helpful too.

Red Hat is taking the high road here and working with server manufacturers to make sure that upstream kernels work painlessly on ARM server hardware. We don't think that ARM in the datacenter can be a success otherwise.

fnj · on Dec 17, 2016

> I can't really go into details because of NDAs

This does not instill confidence of seriousness about documentation.

raverbashing · on Dec 17, 2016

My thoughts exactly

If Qualcomm thinks they can pull the same crap with proprietary modules in the server space and living 'off tree' they're in for a big surprise

Let's see what happens when a sysadmin can't apply a security fix or a needed upgrade to the kernel because Qualcomm's "proprietary driver" has not been updated

faragon · on Dec 17, 2016

In my opinion, the way for ARM to succeed in the server space is:

1) One quarter the price for equivalent performance for server task (e.g. 500 USD for 48-core Qualcomm vs 2000 USD for 16-core (32-thread) Intel). E.g. the problem of 48-core Cavium ARM chip was poor performance, that could be fixed by Qualcomm, with better IPC (big enough L3 cache, and 3 or 4 OooE).

2) One quarter the power usage

3) Reliable and affordable motherboards

4) Devices at similar price vs Intel ecosystem (high performance Ethernet, Infiniband, PCI-E SDD, etc.)

dom0 · on Dec 17, 2016

"One quarter the price" ... "One quarter the power usage" ...

The first is very unrealistic because this is a market and ARM chip makers aren't Santa Clause, the second seems very, very unrealistic, because Intel already has the most efficient "big iron"-ish CPUs (just compare SAP benchmarks vs. power usage of Intel-based vs. POWER-based systems, for instance), and have been under continuous pressure for more than a decade to make things more efficient (because energy use and costs following from it are a major cost factor for all of Intel's biggest server customers). That alone doesn't mean that it's not possible to be better, of course, but it's a strong hint that if no one else managed to get close to Intel there that it maybe is a hard thing to pull off. In any case I think a four-fold efficiency increase is expecting way too much.

10 % better power efficiency at the same performance level per unit would already signal a huge win for ARM.

kikoreis · on Dec 17, 2016

No, actually he is exactly right. 10% better would made no difference to the typical buyer, and to complicate any TCO guesstimate would need to take into account managing a new arch in a deployment, which comes with its own hard to measure overheads.

dom0 · on Dec 17, 2016

Define "typical buyer". 10-15% instant energy savings would be one heck of a reason for FB/Google/Amazon/Microsoft to veer towards ARM for their clouds, especially if the vendor can plausibly show that they have more potential there.

These are servers, not hypedisruptionmarkets. Not once where there 400 % efficiency increases in a single generation of anything.

happyopossum · on Dec 17, 2016

> 10-15% instant energy savings would be one heck of a reason for FB/Google/Amazon/Microsoft to veer towards ARM for their clouds

I'm not sure the upfront effort/cost would be worth it for that savings alone, especially when Intel will just come along and say they'll have that savings in an x86 chip in 9 months without any software dev required.

What this could do for the big DC companies though, is provide them a lever to keep Intel on that path. That lever alone might be worth all the dev effort required to support 2 architectures in a DC for some customers - and even if it's not worth it empirically, it may make them feel like it is.

rshm · on Dec 17, 2016

Except the core counts, no other specs like frequency, power or cache. Are there any qualcomm docs.

hehheh · on Dec 17, 2016

No specs, no benchmarks, no looking inside the case..

rwmj · on Dec 18, 2016

When they're available to buy, which is not long away, you'll be able to find out all that. The cases aren't going to be welded shut :-)

It's not surprising that Qualcomm don't publish this because they themselves probably don't know the yields from each bin in the final process.

kenOfYugen · on Dec 17, 2016

What modifications are required to be made to the linux kernel/tcp stack to take advantage of 48+ cores, in order to achieve a more linear scalability?

Are there any real-life experiences? Would a different TCP stack such as mTCP[1] suffice?

1. http://shader.kaist.edu/mtcp/

vegabook · on Dec 17, 2016

Intel is quite happy to ship CentOS Linux on its 64-core (256 hardware thread) Xeon Phi Knights landing dev boxes:

http://dap.xeonphi.com/

Here is Linux htop with 288 hardware threads on the higher end KL:

http://www.admin-magazine.com/var/ezflow_site/storage/images...

jdub · on Dec 17, 2016

Linux is already pretty comfy at 48+ cores (and more), probably especially so on a single socket.

dom0 · on Dec 17, 2016

> What modifications are required to be made to the linux kernel/tcp stack to take advantage of 48+ cores, in order to achieve a more linear scalability?

https://xkcd.com/619/ isn't a joke.

Linux is the default operating system of huge NUMA systems with hundreds of CPUs / thousands of cores.

loeg · on Dec 17, 2016

They have put quite a bit of work into scaling, but that doesn't mean it's perfect yet.

baybal2 · on Dec 17, 2016

Yes, if you want to run TCP/IP on it. But if you have bucks for it, it takes just a little bit more cash to get rid of packet IO altogether, and the network stack overhead along with it.

loeg · on Dec 17, 2016

If you like 48+ core ARM servers, Cavium's ThunderX offering has been available for some time.

qaq · on Dec 17, 2016

Cavium is about on par with an old e5-2670 that is about $90 on ebay. So while at some future point arm server chips will likely be a good value Cavium's ThunderX is def. not it.

runeks · on Dec 18, 2016

There's no reason to think this will be significantly faster than that. ARM evolved as a low power architecture, back when Intel just didn't have any low power CPUs. ARM is a lot faster these days, but only because architecture complexity, and thus power usage, has increased. There is no free lunch here. ARM isn't super efficient at all, it's just low power.

This chip has applications, but probably not in compute-heavy data centers. Networking gear maybe? Give each port in a switch an ARM core?

loeg · on Dec 17, 2016

Do you believe Qualcomm's will be any better value?

qaq · on Dec 17, 2016

No clue I guess we need to wait for benchmarks

pier25 · on Dec 17, 2016

I read somewhere that ARM was growing fast because it hadn't been as thoroughly developed as x86 in the past. So right now ARM manufacturers were catching the lowest hanging fruit, so to speak.

Is ARM going to hit a limit soon like x86 has?

Also in the comments of the article someone says that the x86 architecture is outdated vs SOC architecture.

Is this true? Does this mean we'll see SOC on desktop in the near future?

ChuckMcM · on Dec 17, 2016

Yes and no. ARM is growing fast, it is growing fast because there are lots of manufactures making ARM cpus.

First there is diversity, while Intel or Atmel might announce 3 new CPUs with 2 or 3 variants each in a year, you have 10 manufacturers each with several new offerings so that it many many different CPUs.

The ARM "tax" (licensing fee) is lower then the minimum Intel "margin" (what they demand over the cost to produce) so the device maker's margins are better.

ARM is more open about its various interconnects so it is easy^h^h^h^hpossible to add features to an ARM CPU in a custom SoC. And to delete features if you don't need them. The Cortex M4 cpus without Floating Point is a good example, try to buy a Pentium processor without a built in FPU.

The thing that kept Intel in the driver's seat was software, and it was why (in my opinion) Itanium never had a chance over AMD64. Software, software, software. And once open source broke the server OS monopoly and smartphones became the computer for the masses, ARM was a good bet.

Is there a CPU saturation limit? That is an interesting question.

wmf · on Dec 17, 2016

Yes, the specs of Centriq and Vulcan are probably similar enough to Skylake that there's no catching up left for them to do. At best they could advance at a similar pace to Intel/IBM.

When it comes to the desktop, Intel is driven more by business concerns than technology. Putting the southbridge on the processor doesn't have that much advantage, but AFAIK Zen can run by itself with no external southbridge to some extent.

kikoreis · on Dec 17, 2016

Well, Vulcan is now kaput, Seattle seems disowned and X-Gene is up for grabs, so only Qualcomm and Cavium remain as contenders in the server SoC space. I predict they still have a solid shot but it will require a lot of uncomfortable adjustments along the way.

monk_e_boy · on Dec 17, 2016

I wonder if the number of cores and threaded will now follow a similar law to Mores Law. 18 months seems too quick, maybe doubling every 8 years...

cobookman · on Dec 17, 2016

Moore's law states transistor count doubles approx every two years. So if that keeps up either you'll see more shrinking of computer size, utilization of the new transistors per square inch, or both.