Google's POWER8 server motherboard

nkurz · on April 29, 2014

In case it helps, the larger context of this story is that IBM has spent a couple billion dollars developing a new server CPU (POWER8) that is just about to come on the market: http://www.forbes.com/sites/alexkonrad/2014/04/23/ibm-debuts...

They've also formed a consortium to promote this processor, of which Google is a flagship member (http://openpowerfoundation.org/). The expectation (or hope, or fear, depending on your point of view) is that Google may be designing their future server infrastructure around this chip. This motherboard is some of the first concrete evidence of this.

The chip is exciting to a lot of people not just because it offer competition to Intel, but because it's the first potentially strong competitor to x86/x64 to appear in the server market for quite a while. By the specs, it's really quite a powerhouse: http://www.extremetech.com/computing/181102-ibm-power8-openp...

higherpurpose · on April 29, 2014

I think that rumor also stated something about ARM, so Google may not be done designing its chips yet.

I'm glad they are finally doing this, not so much because I care about what happens in the server world, but because so many product chip decisions at Google have been political (by choosing Intel chips) simply because Otellini was on their board. Hopefully this will signal a change from that.

raverbashing · on April 29, 2014

However, the "Google model" of computation involves a huge amount of cheap "light" servers, instead of a few "big" servers (on which the Power model was based)

Well, the Power architecture had some success in Apple products, but ended with the inability of IBM to scale production and produce parts that consumed less power

justincormack · on April 29, 2014

Google's servers are not that light, and this is a dual socket one, so 20-32 cores or so, rather than a huge Power 16 socket board which are the real scale up ones, so it is not that much more scale up. You get more IO bandwidth out of Power than Intel.

rincebrain · on April 29, 2014

[citation needed]? I've not seen any useful IO benchmarks of POWER in a long time, if ever, and they've never been remotely comparable to more commodity systems, since POWER almost always gets used in the huge systems you mention...

valarauca1 · on April 29, 2014

Citation Given: http://www.extremetech.com/computing/181102-ibm-power8-openp...

Power8 has 230GB/s of bandwidth to ram compared to a Xeon's 85GB/s. That's nearly triple (270.5%) a XEON's I/O speed.

dekhn · on April 29, 2014

ummm... I see STREAM copy benchmarks for Xeon reporting at least double the number you cite.

Further, a benchmark like this is complicated. How is RAM divided between sockets? What's the bandiwdth between a CPU and memory in another socket? etc etc

valarauca1 · on April 29, 2014

Unable to respond, citation not given.

But I'll respond anyways. It looks like you have GB and Gb per second confused. One is 8x the other.

admin-magazine: claims 120Gb/s [1]

intel claims: 246Gb/s [2]

Independent claims on intel forums range from 120-175Gb/s [3]

Intel's cut sheet for their own latest generation xeon states it only supports 25GB/s (200Gb/s) memory bandwidth [4]

[1] http://www.admin-magazine.com/HPC/Articles/Finding-Memory-Bo...

[2] http://www.intel.com/content/www/us/en/benchmarks/server/xeo...

[3] https://software.intel.com/en-us/forums/topic/383121

[4] http://ark.intel.com/products/75465/Intel-Xeon-Processor-E3-...

dekhn · on April 29, 2014

Reporting memory bandwidth in Gb is misleading.

The page you cite here: http://www.intel.com/content/www/us/en/benchmarks/server/xeo...

shows a triad bandwidth (STREAM) of 246,313.60 MB (megabytes) per second which is 240 GB (gigaBYTES) / sec.

If I'm making a mistake I'm sure we can work it out.

justincormack · on April 29, 2014

That 240GB is on 4 sockets, so corresponds roughly to the 85GB/socket cited above, while the Power7 compared was about half the Intel, so about 50GB/socket, while Power8 is allegedly at 240GB/socket.

dekhn · on April 29, 2014

Thanks for the clarification. HN took ~30-40 minutes before it would show a reply option for this post, so I went ahead and acknowledged the performance difference in a reply to my original reply (ugh). Anyway, that's great to see such high memory bandwidth per socket, rather than summed over the whole machine.

justincormack · on April 29, 2014

If you click the "link" button you can reply sooner...

dekhn · on April 29, 2014

[4] is not a valid link: the E3 is not the highest performing Intel chip wrt memory. The E3s don't have powerful memory controllers like the E7.

dekhn · on April 29, 2014

A reply to myself since I can't reply to the informative reply below (thanks HN). The RAM benchmarks for Xeon are per-machine (summed over sockets, presumably with no cross-socket traffic, since cross-socket memory is 1/2 speed) while the Power8 benchmarks are per socket.

That is indeed impressive. no wonder google is considering these. memory bandwidth is indeed critical.

on April 29, 2014

[deleted]

dekhn · on April 29, 2014

Thanks, but I neither track my score nor care that you think I'm condescending.

dang · on April 29, 2014

The (now-deleted) comment you replied to was right. Your comment would have been much better without "ummm...".

All: please don't use snark in HN comments. It violates civility and degrades the discourse.

dekhn · on April 29, 2014

It's your opinion that "umm..." is snark. It's what I would say in person if I doubted your claims.

Notice that I actually had a question and it was answered in the thread, which I acknowledged.

dang · on April 29, 2014

Happy to take your word for it.

I could well be hyperallergic after years of wincing at gratuitous unpleasantness in HN comments (edit: and, to be fair, having contributed my own share of it). In person, of course, tone of voice would reveal everything about "ummm...".

Either way, please understand that the intent behind comments like this is in no way personal. The idea is to send feedback signals into the community about the kind of discourse we want to cultivate.

dekhn · on April 29, 2014

You're microoptimizing. Spending time to complain about an "um" or and "uh" which had no semantic content implied other than what um or uh normally means is pointless. You can't micromanage every conversation, nor should you. Focus on outright negative comments. mine was surprise and disbelief (which was later rectified by data). System working as expect, no SNAFUs!

queensnake · on April 29, 2014

If it was a snotty 'ummmm', I personally would object to it in real life, too. And I think others would as well. Also, you don't need 'ums' in writing.

15characterlimi · on April 29, 2014

Are you talking to yourself?

SSLy · on April 29, 2014

I think they could use this on edge servers, as well with those algorithms, that they couldn't distribute enough.

hwd · on April 29, 2014

I think you are mixing up Power and PowerPC (which was created by IBM, Apple and Motorola in 1991).

zymhan · on April 29, 2014

What exactly is the difference? Looking at this Wikipedia article, it looks like PPC is the descendant to the Power ISA. https://en.wikipedia.org/wiki/IBM_POWER_Instruction_Set_Arch...

EDIT: Digging a little more gave me the answer: "The POWERn family of processors were developed in the late 1980s and are still in active development nearly 25 years later. In the beginning, they utilized the POWER instruction set architecture (ISA), but that evolved into PowerPC in later generations and then to Power Architecture. Today, only the naming scheme remains the same; modern POWER processors do not use the POWER ISA."

https://en.wikipedia.org/wiki/IBM_POWER_microprocessors

valarauca1 · on April 29, 2014

https://upload.wikimedia.org/wikipedia/commons/3/3b/PowerISA...

Gives a nice visual presentation of what happened.

ksec · on April 29, 2014

So Presumably, Google will manufacture their own POWER8 CPU. But Who made them? TSMC? GloFo? Not IBM since IBM will be exiting Fab business in the near future.

I am going to guess this Dual CPU variant will be aiming at Intel Xeon E5 v2 Series. The 10 - 12 Core version cost from anywhere between $1200 - $2600. Although Google do get huge discount for buying directly from Intel and their volume.

Assuming the cost to made each 12 Core POWER8 to be $200, that is a potentially cost saving of $1000 per CPU, and $2000 per Server.

The last estimate were around 1 - 1.5 Million Servers at google in 2012 and 2M+ in 2013. May be they are approaching 3M in 2014/15. Even with most of those are low power CPU for storage or other needs. One million CPU made themselves could be savings of up to a billion.

Could this, kick start the server and Enterprise Industry to buy POWER8 CPU at much cheaper price? And Once there are enough momentum and software optimization ( JVM ) it could filter down to Web Hosting industry as well.

In the best case scenario, this means big trouble for Intel.

hershel · on April 29, 2014

In the link here they estimate the price for a single power8 cpu at $5000, based on real server price.

http://www.extremetech.com/computing/181102-ibm-power8-openp...

wmf · on April 29, 2014

That may be list price and keep in mind that a similar Xeon costs around $4,600.

ihsw · on April 29, 2014

This could also mean big trouble for AMD since they're pushing high-density ARM64 in the data center, one could even go so far as assume that Google isn't taking AMD seriously.

presootto · on April 29, 2014

This POWER8 board has been in development for a long time and will only be used if it turns out to be cheaper then the Intel/AMD alternatives.

The platforms team at Google is also developing an ARM solution. They have GPU's in testing as well.

AnthonyMouse · on April 29, 2014

There is also the distinct possibility that they would use both. The best hardware for processing search queries is not necessarily the best hardware for hosting map tiles, etc.

solarexplorer · on April 29, 2014

What would be the point for Google to build their own CPU? How would it be different from what IBM has to offer?

barkingcat · on April 29, 2014

The parent poster is referring to IBM leaving the cpu fab business, so of course a Google produced CPU (most likely farmed out to a fab) is better than a ghost cpu produced by a shut down factory by non-existent fab technicians/engineers.

wmf · on April 29, 2014

Realistically, you don't shut down a multi-billion-dollar state-of-the-art fab. It may be sold to another company but it will continue to operate.

zurn · on April 29, 2014

AMD has also been fabless for a while. Somehow the chips keep showing up...

protomyth · on April 29, 2014

Is it confirmed IBM will stop chip fab?

wmf · on April 29, 2014

hershel · on April 29, 2014

Another possibility, with the power being open and having a market in servers is easic which offers chip manufacturing technology that is fit for lower volumes, with easier customizability option, to offer a customizable server processor.

In the places this fits, it could offer substantial improvement. for example 10-100x performance/cost+power for in-memory cache servers.

And they're working on making this tech programmable while still keeping this same cost levels.

And all this in the context of moore's law grinding to a halt. So definetly ,intel will have a hard time ahead.

EDIT: it appears that the power8 support an open extension interface to other chips(CAPI). Which means will see such accelerators sooner than later.

_delirium · on April 29, 2014

Do you mean ASIC for the coprocessors, or for customizing the main design? The latter isn't strictly impossible, but it'd be a pretty major undertaking to take the POWER8 design and put it on an ASIC. The reference design, at least, is intended for a 22nm SOI process, and it's not a trivial port to put it on something else.

listic · on April 29, 2014

I wonder if POWER8 based servers will be available for the mass market? I'm not sure whether Google is interested in commoditizing POWER8 servers or just participates in the OpenPOWER foundation to ensure that POWER-based servers will suit their needs. The fact that Google is open about their new motherboard hints at the former, but it's not much.

I wonder how non-Google-scale developer could even potentially get to use POWER-based servers. Will they be available from the regular dedicated server hosting companies? What OS could they run? RHEL does support POWER platform, but for a hefty price: https://www.redhat.com/apps/store/server/ CentOS doesn't, presumably because all the POWER hardware CentOS developers could get is either very expensive or esoteric. That likely means I don't have to consider using POWER-based servers for at least 3 years, right?

jcastro · on April 29, 2014

> What OS could they run?

Since POWER8 is little endian now it's pretty easy to get things running on it. We had Ubuntu ported in one cycle and 14.04 runs sweet on POWER8. All the compilers and the entire toolchain is ready to go. Everything in the archive works, just apt-get install.

All of your Linux workloads will probably just work on a POWER8 server. I started working on this server about a month ago, and have never used power-anything before. I just ssh'ed in did my work, and unless I did a uname or noticed the URLs with the arch in them when upgrading, it acts just like my Ubuntu x86 machines.

Yesterday at IBM Impact we deployed SugarCRM /w MariaDB and Memcached, a Websphere petstore, and Hadoop (using IBM's Java), all at once from zero to fully deployed and serving in _173 seconds_. These machines are _fast_.

Disclaimer: I work at Canonical and helped run the demo backstage during the POWER announcement.

listic · on April 29, 2014

Wow, thanks! It's great to hear the information first-hand from someone "in the trenches". So, I guess, Ubuntu 14.04 is running, but it's due to some kind of "last moment" special porting effort by IBM, not an official version from Canonical? Or else, why isn't POWER support mentioned anywhere on Ubuntu's website? http://www.ubuntu.com/download/server

jcastro · on April 30, 2014

Not last minute, we've been working with IBM on this as part of 14.04. It's officially a supported platform for 5 years, here are the ISOs, you'll be able to get support for the entire thing from IBM and Canonical:

http://cdimage.ubuntu.com/releases/trusty/release/

It's not mentioned on the website because the hardware is not publicly available yet.We have announced it on the blog though:

http://insights.ubuntu.com/2014/04/28/the-ubuntu-scale-out-a...

When the machines start shipping in real life (I think they said June?) it'll be more obvious on the main site.

All the surrounding ecosystem bits around Ubuntu will also get POWER8 support, so PPAs will start building POWER8 binaries, and all of the deployable services available on jujucharms.com will be available as well.

jacquesm · on April 29, 2014

Depending on your definition of 'mass' you can buy them now:

http://www-03.ibm.com/press/us/en/pressrelease/43702.wss

and

http://www-03.ibm.com/systems/power/hardware/s812l-s822l/bro...

I can't stand it how their 'buy now' link for a product with a listed price then links to a 'get a quote' form. If they didn't do stuff like that I might have bought one of their machines instead of the HP that is currently churning away happily (32 cores, 192G of RAM, quite the little beast).

listic · on April 29, 2014

What OS would you run on a POWER server?

Which model and configuration of HP server did you buy and for how much?

I actually had popular dedicated server hosting providers in mind, e.g. Leaseweb. http://www.leaseweb.com/en/dedicated-servers

Re: linked servers. Thanks for the concrete info! $8K for 10-core / 3.4 GHz POWER8 with 32 GB RAM and 2x300 GB 10K rpm drives.

Those have to be some freakishly good 10 cores, to justify that kind of a price at least for _some_ use cases.

jacquesm · on April 29, 2014

The machine was about 6K euros with the base memory of 32G I forgot what I paid for the memory upgrade but it wasn't all that much.

DL 385 G7 iirc, cpuinfo gives 2 times (AMD Opteron(TM) Processor 6274), max ram is 256G

hindsightbias · on April 29, 2014

http://www.enterprisetech.com/2014/04/28/inside-google-tyan-...

Tyan boards, RH, SLES, Ubuntu.

> I wonder how

http://www-304.ibm.com/events/idr/idrevents/detail.action?me...

IDK when POWER8 get's integrated into Bluemix, but it will as systems get out there.

Virtual Loaner Program:

http://www-304.ibm.com/partnerworld/wps/servlet/ContentHandl...

wmf · on April 29, 2014

Considering that IBM is a cloud provider, you might guess that Power servers will be available that way.

Also, Ubuntu works better on Power than CentOS.

bhouston · on April 29, 2014

Can someone explain the benefits of POWER8 as compared to Intel? I though the volume of POWER8 chips being low (as compared to the exceedingly powerful Intel and Arm chips) would mean that innovation in that area would be low as well.

Sanddancer · on April 29, 2014

Ridiculous parallelism. A POWER8 chip has 12 cores, and each core can handle 8 threads. As a result, these chips can keep the pipeline pretty much always full, and provide massive performance boosts to things like database servers.

joosters · on April 29, 2014

Sun's Niagara line of processors had similar numbers of threads (64/128), but they often had lacklustre performance. With this amount of parallelism, it becomes very difficult to keep all the threads busy, even for highly scalable programs. You'd get hit by all kinds of problems, like blockages due to memory throughput, or shared resources (IIRC the Niagara threads shared FPUs and other processing blocks, so to get them all running 100% of the time you'd have to manage the workload of your tasks insanely carefully)

On top of that, so much stuff just doesn't scale well. On the early Niagaras, even ssh-ing into the machine was noticeably slow. Oh? your crypto doesn't use all 64 threads? Hard luck!

zurn · on April 29, 2014

IBM's semi guys have been doing all kinds of wacky things with low volume chips (and also high-volume, see eg. prev gen consoles).

For example their System/360 compatible mainframe CPUs are doing 5.5 GHz now. https://en.wikipedia.org/wiki/IBM_zEC12_(microprocessor)

Chip design is somewhat like software, adding more people or resources doesn't necessarily make your product superior. Look how long tiny AMD has been putting up a fight.

thrownaway2424 · on April 29, 2014

The advantage is Google can wave this mobo around in Intel's face to get a better price on xeons.

rwmj · on April 29, 2014

POWER is fast. I have remote access to a 64-way POWER7 server through work and it really rocks.

AnthonyMouse · on April 29, 2014

That's the interesting thing about POWER. Uses 250 watts? No problem. Costs $5000? Whatever. They only seem to have one design criteria: It has to be fast.

StillBored · on April 29, 2014

The problem with POWER7 has been the IO connectivity. The use of the GX++ bus has been a huge bottleneck (something they are obviously fixing for POWER8). The theoretical bandwidth on GX++ is 20Gbit and so its basically the equivalent of a single x4 PCIE 2.0 slot. This was borderline bad in 2010 when POWER7 was released, now it looks even worse.

KMag · on April 29, 2014

I used to work on Google's indexing system, and I guess for lot of Google's workloads, total machine performance-per-Watt is the key metric. Many server workloads are largely coordinating the disk controller and the network controller, with maybe some heavy integer and a bit of fp processing in the middle, so your normal pure CPU performance-per-Watt benchmarks can be misleading. ("A supercomputer is a device for turning compute-bound problems into I/O-bound problems." --Seymour Cray, possibly apocryphal) POWER8 uses a lot of Watts, but it also has high I/O throughput, so for many server workloads it could beat Intel chips in full system performance-per-Watt. I would be really interested in seeing an I/O-per-Watt CPU benchmark.

Many of Google's workloads are embarrassingly parallel. I used to work on Google's indexing system, and one of the binaries I worked with had a bit over a million threads (across many processes and machines) running at any moment, with most of those threads blocked on I/O. POWER8 has a fair number of hardware threads per core, which should help with the heavily multithreaded style of programming used in many Google projects.

Google's datacenters use enormous amounts of power. Several of their locations are former Aluminum smelting plants, because Aluminum smelting also uses enormous amounts of power so the power lines are already in place. My team luckily happened to sit next to one of the guys who designs Google's datacenters and we happened to overhear him say something over the phone about not being able to make sense about enough power to power a small town just disappearing from our usage. One of the guys on my team asked him when this power reduction occurred, and if it had happened before. We worked out that the huge swings in power usage happened when we shut down the prototype for the new indexing system. Most of Google's datacenters are largely empty space because they are limited by the power lines and cooling capacity.

In another instance, we discovered a mistake had been made in measuring the maximum power drawn by one of the generations of Google servers. Google had a program designed to max out the systems, and they plugged a server into a power meter wall-wart and ran this program for a while. The maximum power usage was under-estimated due to a combination of the office where the measurement was made being cooler than a datacenter (electrical resistance of most conductors has a positive temperature coefficient in the range of temperatures found in working servers), the machine not being allowed sufficient time to warm up, and/or the indexing system being more highly tuned than the program designed to maximize server utilization. (I like to think it was mostly the latter, but I suspect the first two were the main contributors.) The end result was that cooling was under-provisioned in one of the datacenters. During a heatwave in the area occupied by the datacenter used for most of the indexing process, the datacenter began to overheat, so one of the guys on my team was getting temperature updates every 10 to 15 minutes from a guy actually in the datacenter, and adjusting the number of processes running the indexing system up and down accordingly in order to match indexing speed to the cooling capacity. When you're really truly maxing out that many machines 24/7, some machines will break every day, so the indexing system (as most Google systems) are tolerant of processes just being killed either by software fault, hardware fault, or the cross-machine scheduler.

Through a combination of realizing smart phones were really going to take off and they were all ARM powered and realizing how important total machine performance-per-Watt is to some major server purchasers lead me to invest in ARM Holdings in mid 2009. (This does not constitute investment advice. The rest of the market has now realized what I realized in 2009 and I don't feel I have more insight than the market at this point.)

protomyth · on April 29, 2014

One benefit is that IBM is allowing a direct link to the CPU via their CAPI (Coherence Attach Processor Interface). Currently, Intel has frozen everyone out of using their QPI. This resulted in NVIDIA no longer being able to make chipsets like the Ion.

An NVIDIA chipset and GPU would be able to go well beyond what NVIDIA is able to do with Intel chips (limited to PCI hooks).

cdi · on April 29, 2014

Large photo of this motherboard: https://www.flickr.com/photos/ibmevents/14051347355/sizes/o/

They've masked all the chips with something black. Are they hiding chips they are using, or is this something for thermal dissipation?

wmf · on April 29, 2014

Looks like typical Google "secret transparency". You can look but you won't learn anything.

mark_l_watson · on April 29, 2014

Two things. First, slightly off topic: is there anyway this could be a negotiating position with Intel, on price?

Second: while many CPU cores (with enough IO) is great for large Borg map reduce jobs, I am curious to see if Google will develop/use better software technology for running general purpose jobs more efficiently on many cores. Properly written Java and Haskell (which I think Google uses a bit in house) help, but the area seems ripe for improvement.

sp332 · on April 29, 2014

Google is a flagship partner in the Power8 consoritium, so I doubt it's just for leverage against Intel.

fludlight · on April 29, 2014

Google's position as a major backer of the competition gives them credibility at the negotiating table. Intel won't give them better terms unless Google can demonstrate a viable alternative.

mrweasel · on April 29, 2014

Funny layout. I would like to know why the PCI slots a spread out like that.

I know Google don't have a standard rack setup, but still, it would make seens to have all the expantion ports the end of the board... No?

nuriaion · on April 29, 2014

Maybe these Connectors are for daughterboards with Centaur Chips + RAM. (A Power8 can connect to 8 Centaur Chips where the RAM is connected)

justincormack · on April 29, 2014

They must be, as there are otherwise no RAM chips on the board...

Sanddancer · on April 29, 2014

A layout like this means you can use both sides of the motherboard for I/O slots. So in a 1/2u box, you can get more than one or two expansion cards in place. The PCI slots themselves seem to be hammock connectors, which I was curious about too, and googling doesn't seem to have any info, unless it's too early/late and I'm missing the obvious.

z3phyr · on April 29, 2014

It would be great if somebody could list modern computers for personal use which are still based on Power architecture?

msiebuhr · on April 29, 2014

Both Xbox360 and PS3 use modified POWER-designs.

Edit: Also, it's notable that both Xbox One and PS4 switched to use x64.

riffraff · on April 29, 2014

Wii and WiiU also have POWER-based designs IIRC.

carey · on April 29, 2014

Both are PowerPC 750-based according to Wikipedia, which is part of why the Wii U runs games released for the Wii.

Note that PowerPC, as used in the Wii, Wii U and old Macs, is not exactly the same as POWER, as in this announcement. The POWER architecture is used by IBM AIX and AS/400 servers, and by the PS3 in its Cell variant.

DCKing · on April 29, 2014

There is contradictory evidence whether the Cell's PPE was 'PowerPC' or just a 'Power' core. In any case, it could run PowerPC software. Incidentally, three of those exact same cores are used as the CPU of the Xbox 360.

For reference, the PowerPC 750 derivatives in the GameCube/Wii/Wii U are in the same family as the PowerPC G3 used in Macs around the turn of the century. It is also related to the CPU running Curiosity on Mars. So yeah, although Nintendo is a big customer of the Power architecture at the moment, they're not really breaking new ground.

DiabloD3 · on April 29, 2014

The Cell is the world's first in order execution PowerPC. Its similar in design to the G3 family but has a very high clock speed. They stripped a lot out of the chip design (such as the out of order execution pipeline, a lot of the cache brains, etc) to get the core as small and as low power as possible while relying on modern compilers to make the magic happen.

I'm not entirely sure they succeeded in their goals, but with how well SPEs are used in PS3 games, I'm not sure it matters.

bodyfour · on April 29, 2014

When you think about it, it makes sense for a CPU in a game console to not include things like out-of-order. 100% of the software it's running are program compiles for that exact machine. Therefore you can just tell developers to use a particular set of compiler flags and get acceptable instruction scheduling.

Contrast this with the software a PC runs -- mostly compiled to be optimized for a "generic" x86 CPU. In fact, it may have been compiled many years before the CPU was even designed. There is a lot more scope for runtime re-ordering to improve execution unit utilization.

If the whole world ran Gentoo, commodity CPUs probably would be in-order too.

DiabloD3 · on April 30, 2014

Well, if you look at modern IBM System/360 descendants (z/arch, etc), this is almost what they do. Programs are compiled to an IL bytecode, and then recompiled during install to produce a CPU-specific binary. Its largely the same concept.

Narishma · on April 29, 2014

The POWER architecture isn't used anymore, even in POWER CPUs. Everything now uses the Power ISA, which used to be called PowerPC ISA. I know, it's confusing.

roneythomas6 · on April 29, 2014

The pace of innovation was greater on x64 side and software support, especially the build tools in general. Also PS3 game developers had a hard time dealing with the cell processors.

vidarh · on April 29, 2014

For something more esotheric:

AmigaOne's from A-Eon (http://www.a-eon.com/) and A-Cube (http://www.acube-systems.biz/index.php?page=hardware&pid=7) both running AmigaOS 4, and optionally Linux (at least for the ones from A-Eon, not sure about the ones from A-Cube).

justincormack · on April 29, 2014

For personal use, very little. You can buy an IBM low end server for $8k or so eg http://www-03.ibm.com/systems/power/hardware/s812l-s822l/spe...

There are some embedded machines, but not many now.

Apple G5s are still serviceable and supported by modern Linux distros, and are almost free...

frozenport · on April 29, 2014

You need code that scales to several CPUs and stalls on memory access to see an advantage for Power. Most desktops apps don't go beyond 2 threads.

fh973 · on April 29, 2014

This is significant. With POWER back in the game, and ARM server CPUs arriving, Intel will again have competition.

jacquesm · on April 29, 2014

It's a huge difference between a custom made server board and a mainstream CPU for general purpose use in desktops or commodity servers.

ARM is definitely taking market share from Intel on the low end, but AMD is just about the only viable competition intel has right now in the part of the market where they make the bulk of their income and margin.

fh973 · on April 29, 2014

If it is seen as a credible threat, there is no need to walk the whole way. It is enough already if Intel gets the message.

ithkuil · on April 29, 2014

I think currently there are few big customers that can afford the overhead of porting their code and dependencies to a different architecture.

For example I don't expect cloud providers to have a huge marked soon for non x86 architectures. Well there are JVM or other VM users which in theory could not care, as long as you don't need some native library.

In the past the battle with intel had to be played by providing an alternative implementation of the x86 instruction set for precisely the same reason: legacy.

The mobile market proved you can achieve good performance with ARM and especially better power per performance. I really can't wait to see some more fights in this arena.

TillE · on April 29, 2014

Writing architecture-agnostic C++ or C code is mostly just a matter of avoiding some silly tricks which you shouldn't be doing anyway, and using the correct types and sizeofs.

Porting an operating system is hard. Porting a compiler is hard. But most applications on top of those are fairly straightforward to port, if not completely trivial.

Theodores · on April 29, 2014

We have kind of forgotten this.

Anyone old enough to remember the early browsers - Mosaic and Netscape - will remember how every platform was supported, regardless of CPU (including MIPS, SPARC, Alpha etc.) and regardless of OS (IRIX, AIX, System whatever, VMS, Windows, etc.). Furthermore you didn't have to wait two years for the non-Windows version to be updated.

Only when Microsoft made NT Intel only did you get this arrangement where software was locked to a given architecture/OS. Thank goodness Linux came along.

thrownaway2424 · on April 29, 2014

It's possible but not trivial. If a programmer has never been exposed to any memory model other than the Intel PC model, they are in for big surprises when they start using POWER. When will write from thread A be visible to thread B? Can this atomic load pass that other load? For multithreaded C++ programs these are tricky. For other languages that always communicate between threads using messages, maybe not, but also maybe you're favorite language runtime doesn't exist on POWER.

mikeash · on April 29, 2014

They're only tricky if you're communicating data between threads without using locks. Since this has always been a "here be dragons" area even with x86's convenient memory model, code that gets it right for x86 but not other architectures is pretty rare.

justincormack · on April 29, 2014

Many Linux distros have Power support; Ubuntu just added little endian ppc64 support for additional compatibility. JVM is available, and most other software. It is not that much work...

ithkuil · on April 29, 2014

Cool, having the same endianness helps.

Sure, it's doable, I have a few ppc at home, but I know first hand stories of small/medium companies just not wanting to risk that.

They just have higher costs at maintaining some dependencies, custom builds etc. You never know when you will get some new version of something, like jdk8. Then you have things like missing Go compiler (yes there is gccgo for ppc but still not everything works the same).

It's perfect for enthusiast, it's ok for companies with strong investment in IT infrastructure. I just wonder what is the best way to convince those small companies that there is no problem. Perhaps the tools/distros etc are starting to mature at the right point and this will soon no longer be a big practical problem.

pjmlp · on April 29, 2014

Having gone through a few architectures, I think the worst offenders is the huge area of undefined/unspecified behavior of C and C++ compilers where many developers assume their compiler behavior is part of the standard.

Other languages with more OS agnostic libraries tend not to suffer that much from porting issues except from bit fiddling code. There is however the availability issue as you mentioned.

kyrra · on April 29, 2014

This was just asked on the Golang mailing list[0].

> nobody is working on it very seriously.

[0] https://groups.google.com/forum/#!topic/golang-nuts/sDV6ZfhG...

justincormack · on April 29, 2014

Maybe Go will get fixed if Google is trying ppc...

thrownaway2424 · on April 29, 2014

I think that's the tell. Go is developed at Google, largely for Google, and there's an ARM port but no POWER port.

thescrewdriver · on April 29, 2014

I'm not sure that one motherboard can count as "back in the game".

foxhill · on April 29, 2014

250W TDP in a package that size.. as the article correctly states, it's about how many FLOPs you can get inside a rackmount case. that TDP alone is going to mean that you wont be able to put that many in a single case.

a dual socket board, 500W on CPUs, 600W with everything else.. the power supply would have to be something special, but the biggest challenge there would be getting the energy (ala heat) back out of the box..

GPUs have similar TDPs and issues - that's why the HSFs on top of them are so massive (and hence GPUs have a bit of an advantage here - they have the entire PCIE board to fit their cooling hardware on)

finally, 4.5ghz? what the hell? in one clock cycle, a beam of light wouldn't even get half way across the board (EDIT: not chip). branch/cache/TLB misses may literally kill any reasonable performance you might hope to get out of it. intel get around this by having years of market leading research in branch predictors, caching models, etc. and it's going to be no mean feat to match that.

i know IBM aren't exactly new to this game. but AFAIK x86 has always been faster, clock for clock, than POWER.

that said, i hope my concerns are misplaced. i'm hoping intel get some competition in the server room. it will be of benefit to everyone.

yaakov34 · on April 29, 2014

Light would travel about 660 millimetres in 0.22 nanoseconds, and the chip is about 25 millimetres on the side, so a beam of light could run a few laps around the chip in one clock cycle, or bounce off the sides 20-30 times. Maybe you wanted to say across the motherboard?

I don't think 4.5 GHz is somehow ridiculous when 3 GHz is routine (and POWER7 was 4.2 GHz). Hundreds of cycles of latency when accessing anything off the chip is now routine - that's the world we live in now. I think that the biggest problem is that IBM is not able to make the investments (especially in semiconductor manufacturing) to match Intel's rate of bringing technology to market. The current POWER7 is a 45-nm device if I remember correctly, and this 22-nm POWER8 is not yet on the market. Intel has been selling 22-nm Haswells for how long now? And of course the POWER7 chips have been up against next-generation semiconductors for most of their life.

EDIT: I see that IBM started selling POWER8 systems a few days ago. That's close to a year later than Haswell, and what's more, this chip is likely to compete against 14-nm processors for most of its lifetime.

sbierwagen · on April 29, 2014

  Light would travel about 660 millimetres in 0.22 nanoseconds

Important note: the wave propagation speed in copper can be as bad as .42c

hga · on April 29, 2014

Aren't they unique in using DRAM for the lowest level of onboard cache, and therefore have a lot of it since those are just tiny cells that store a charge for a while?

Yeah, starting with the POWER7. POWER8 has 96 MiB of eDRAM (e for embedded).

The Centaur memory controllers also have 16 MiB of eDRAM, max them out at 8 and you get 128 total at L4.

Compared to Intel's current offerings, the L1 data cache and L2 unified cache are twice as big. Don't know about timings, though.

The biggest Intel Ivy Bridge Xeon server CPUs have slightly more transistors (100 million), but on a much smaller die, 31% less area. Look at the ones with 12 and 15 native cores: https://en.wikipedia.org/wiki/Ivy_Bridge_(microarchitecture)... they list at $2336 to $6841.

ISL · on April 29, 2014

Light travels 30 cm/nanosecond. 30/4.5 = 6.6 cm, larger than the chip.

cliveowen · on April 29, 2014

Between people shifting from pc to arm-powered phones and major data-center users doing their best to cut costs this is shaping up to be a tough decade for Intel.

fidotron · on April 29, 2014

In all seriousness, I would not want to be leading Intel right now as I can't imagine what they could actually do to escape this.

Hindsight makes Itanium look like even more of a disaster, when that energy in that era should have gone into evolving the x86 platform for the future. Without AMD doing what they did (x86-64) I wonder where Intel would actually stand in the server market today.

thrownaway2424 · on April 29, 2014

Intel has a massive advantage in process technology and fundamental physics research and a huge, talented staff. A good executive team can print money in any line of business with all those assets.

mindcrime · on April 29, 2014

For a while, yeah. But there really is no such thing as a truly sustainable competitive advantage. Anybody can be knocked off their perch if they don't adapt quickly enough.

orbifold · on April 29, 2014

From what I heard modern Intel chips basically only keep up the x86 instruction set as a facade and the architecture beneath is different (much larger number of registers etc.). Wouldn't it potentially be a good idea to do a clean redesign of the "frontend" and eliminate all the legacy support?

gsnedders · on April 29, 2014

Hell, to take this to the extreme level, look up the PPC 615 — mid-90s IBM processor that sadly never made it to market. It ran x86, PPC32, and PPC64 code natively in the mid-90s, socket compatible with the current Intel Pentium and with comparable performance. Sadly, it was cancelled as IA-64 was expected to enter the market soon, and those at IBM presumed IA-32 (i.e., x86) was about to be killed.

zhemao · on April 29, 2014

Yes, modern Intel CPUs use a RISC-like architecture underneath. The CPU contains a decoder unit which converts x86 instructions to RISC-like micro-ops. Getting rid of x86 support would not be a good idea. It's their "legacy support" which has allowed them to dominate the desktop and server market. Porting your software to a new architecture can be a real pain.

mikeash · on April 29, 2014

I wonder if it would be reasonable to come up with a more modern ISA (or just borrow somebody else's, like AArch64?) and offer that as an alternate front end.

Keep the x86 decoder front end that they have now. Add another one for the better ISA. Add another mode that kicks the CPU into that ISA. Current x86-64 OSes already generally support two architectures: x86-64 and i386. This would just be a third one. Then everything could move to the new ISA incrementally, and legacy software could keep on working forever using the legacy decoder.

I'd guess that the x86 ISA is no longer enough of a bottleneck to justify it. Throw enough transistors at the problem and perhaps it doesn't matter anymore whether your ISA makes any sense.

orbifold · on April 29, 2014

Couldn't the decode unit be translated into software that converts legacy software on the fly? It seems to me that the outdated instruction set is detrimental to innovation and power efficiency. Also how high is the overhead in terms of chip space and power that the decoder incurs, vs. one that has to decode a simpler instruction set? My limited understanding is that the number of instructions issued per cycle heavily depend on the instruction set and decode speed.

klodolph · on April 29, 2014

> Couldn't the decode unit be translated into software that converts legacy software on the fly?

Not only is it possible, but it has been done. Look up the Transmeta Efficeon. Note that it's not RISC-like but VLIW (which I like to think of as "the next thing after RISC").

dazam · on April 29, 2014

Transmeta tried to do exactly what you are proposing i.e, have a software layer (Code Morphing) translate the x86 instruction stream to their native VLIW instruction set. They didn't go very far.

orbifold · on April 29, 2014

In the case of Intel the underlying architecture is really fast, whereas my understanding is that Transmeta failed because the VLIW architecture did not really work out.

If Intel Cpus use a Risc like architecture nothing would prevent static translation to it, which was not feasible for Transmeta cpus. The x86 software layer would then only be there to preserve backwards compatibility.

thrownaway2424 · on April 29, 2014

You just suggested that decode is a bottleneck, and that decode should be done in software, in the same paragraph.

orbifold · on April 29, 2014

Not really, I suggested to decode once at the start of the application, or even at compile time for that matter in order to gain legacy support. A cpu has to decode the instruction stream no matter what, if the decoding is complicated by the fact that there is no close correspondence between the instructions and the hardware this is a potential bottleneck.

lnanek2 · on April 29, 2014

That's basically what Itanium was and it failed vs. the AMD competitor that kept support.

Spooky23 · on April 29, 2014

Maybe, maybe not.

IBM was pretty close to cutting off most funding to future development, and they just closed the factory in Minnesota that made the servers. They are pretty much the last survivor of the Unix wars.

There are major customers using this stuff and scaling up, but the industry as a whole is shifting to scale out. There might be a good story here with POWER8, but can you trust that the platform will be around?

protomyth · on April 29, 2014

"they just closed the factory in Minnesota that made the servers"

....which they moved to Guadalajara, Mexico. Given the money they put into Watson, I would expect they keep the POWER machines.

Spooky23 · on April 29, 2014

I would guess that if they were projecting lots of growth in the line, they wouldn't have done a disruptive factory move.

The business growth for power was displacing PA-RISC, SPARC and Itanium... IBM is the last man standing, but are competing against whomever is supplying AMazon, etc for barebones x87. In the IBM line, it's jammed between commodity x86 and high margin zSeries.

protomyth · on April 30, 2014

I'm not so sure, MN is not business friendly.

wickberg · on April 29, 2014

I wouldn't rely on the Watson connection to influence their chip division strategy.

It's not something they spread around, but from what I recall from a Q+A with IBM engineers the Watson prototype was developed on AMD xSeries servers, and only moved to pSeries machines late in the development before the Jeopardy! matches.

zurn · on April 29, 2014

So they're saying it's easier to use a brand new incompatible little endian Linux personality, with associated new toolchains and new ports of low level stuff etc compared to the standard Linux PPC64 stuff...

Sounds kind of surprising even if IBM did some of the bringup work ahead of time, but maybe they've got little endian assumptions baked in many internal protocols/apps.

hapless · on April 29, 2014

Linux has supported little-endian POWER for several years. It makes porting userspace software tremendously easier since that the major architectures in use with Linux (x86_64, ARM, MIPS) are frequently LE.

The big news here is official support for KVM on POWER. Use all your existing automation, Openstack, etc, unchanged.

zurn · on April 29, 2014

KVM support indeed sounds nice, though they advertised it already before for POWER7/PPC970 hardware...

Re endianness, Linux is comfortably bi-endian and so is approximately all portable Linux software. Certainly there isn't any "tremendous" difficulty, which is why I was expressing surprise at ditching the standard big-endian ppc64 userspace.

When people first started floating around the little endian PPC patches in 2010 it seems the motivation was some GPU hardware (http://lwn.net/Articles/408848/) but that doesn't really make sense in this case.

hapless · on April 30, 2014

notably the goal with the GPU hardware was to avoid doing endianess conversions in userspace code that talked to GPU drivers.

userspace, the root of all evil

rbanffy · on April 29, 2014

Linux has supported POWER for ages. Is endianness such a big issue? Why?

sparkie · on April 29, 2014

Endianness is an issue because programmers ignore it - they think "undefined behavior" is a synonym for "not yet standardized", and the mentality of "works on my machine" typically trumps concerns of portability.

This isn't a concern for low level developers, such as the kernel developers - they understand the concerns and take care to implement code in portable ways.

The issue is with user-space developers who think C and C++ are a good choice of language, and they have no qualms using bitfields, unguarded compiler pragmas, violating strict aliasing rule, and failing to specify the endianness their protocols use in the protocol itself (BoMs are not universally used) - also there is often a failure to provide the endianness conversions in implementations of such protocols where necessary. Not to mention a complete lack of standard way to test the endianness of the current machine, which typically requires violating the strict aliasing rule to check.

rbanffy · on April 29, 2014

I believe it's safe to assume having more endian-diversity is then a good thing. Bugs in software and protocols will be exposed and eventually corrected.

Since most Linux distros fully support a very diverse set of machines, endianness is usually not a problem with most of the software that's already part of a Linux distro.

As for software developed inside Google, they hire smart people. They'll manage.

karavelov · on April 29, 2014

I think the main reason for LE runmode is to be able to easily interface Nvidia GPUs that are also LE (without conversion)

sp332 · on April 29, 2014

Does that say "little-endian support"? Like you just set a flag and all your math switches from big-endian to little-endian?

rootbear · on April 29, 2014

Little-Endian mode for MIPS processors was sometimes called SPIM, so I guess little-endian POWER is REWOP. I never heard little-endian SPARC referred to as CRAPS, which is too bad as I thought that was pretty funny.

termain · on April 29, 2014

I believe Power has long been a bi-endian architecture. I gather it's a switch thrown (in either software or hardware) at startup.

klodolph · on April 29, 2014

The old PowerPC processors did this by flipping the low bits of memory addresses when in little-endian mode, but the data lanes had to be reversed to make this work. So back in the day, it only meant that you could use the same chip for a little-endian design but not the same motherboard. I don't think that's how newer POWER processors work, though.

KMag · on April 29, 2014

Correction: a modified memory controller wasn't necessary as long as all of your memory accesses were naturally aligned. As I remember, in little-endian mode, unaligned accesses would also trap to the kernel, so kernel authors could include code that would fix things up, at a huge performance penalty for unaligned access.

Most architectures that support unaligned access have a small penalty for unaligned access anyway, and some architectures (Does anyone remember Netscape Navigator on Solaris SPARC crashing with SIGBUS much more often than the same Navigator release crashing on x86? At least Solaris 6/7 didn't include kernel code to emulate support for aligned memory access on SPARC.) don't support it, so it's best to avoid unaligned memory access in C code.

I don't recall the JVM specification forcing a particular object layout on an implementation, and I believe most JVM implementations naturally align all object fields rather than packing them for minimum space usage. I believe an implementation could reorder the fields in order to optimally pack them while avoiding unaligned accesses, at the cost of breaking any hand optimization of locality of reference made by the programmer. However, I think the space savings for almost all programs would be very meager.

protomyth · on April 29, 2014

The 970 (G5) didn't have the little-endian mode. Virtual PC was impacted.

dekhn · on April 29, 2014

there are a number of software-switchable processor arches that support bi-endian.

teepo · on April 29, 2014

Would these be too pricey as hypervisors for cloud compute? It seems to me to be ideal for CPU thread intensive applications like databases, on-demand transcoding.

What are some use cases for a server like this for Google? I'd love to see these available in the IBM Cloud (SoftLayer) but I think they will be too pricey and reserved for enterprise.

jameshk · on April 29, 2014

If these are cheap enough (someone said $200 a pop with bulk discounts) then cloud providers will get a big boost.

huslage · on April 29, 2014

You can also logically partition these beasts into multiple real servers. Who needs a hypervisor when you can have 96 "real" servers sharing the same hardware?

mikeash · on April 29, 2014

Pardon my ignorance: what's the difference between using a hypervisor and running "real" servers that share hardware?

sp332 · on April 29, 2014

Memory bandwidth would be a nightmare, not to mention every other kind of I/O.

mzs · on April 29, 2014

There are some tricks for IO: http://www.redbooks.ibm.com/redpieces/abstracts/redp5065.htm...

Corrado · on April 30, 2014

I think its interesting that they didn't include the "traditional" mouse/keyboard/VGA ports. Not particularly surprised since this is a server motherboard, but still interesting. I think I do see an HDMI connector in the lower right next to a tall silver port (possible USB connector).

jmnicolas · on April 29, 2014

It's a bit short on details imo : where are the specs, the benchmarks etc ?

jacquesm · on April 29, 2014

It's actually quite impressive that google would open up this much of their secret sauce, a lot can be gleaned from looking at this board. You can bet that this is not exactly revision one (and you can bet as well that this is likely not their latest and greatest, no need to show off more than you have to, competitive edges are pretty thin).

When I see stuff like this it is painfully clear that from a technological perspective a company like duck-duck-go has a huge amount of defensible moat to cross before they can begin to be a serious contender. Think about it for a second: the company that you're trying to compete with is operating at such economies of scale that it can afford to have its own custom motherboards + non-standard expansion boards made.

peterfisher · on April 29, 2014

I love when google announces something through Google+