TheNextPlatform is a pretty bad site. They rehash - badly - information available elsewhere, and add a hyperactive spin on it all.
Here's the truth:
Google uses lots of compute power (insightful!)
Google isn't shifting to Power.
Google does have an active R&D program looking at Power.
TheNextPlatform misses the whole point here: That Zaius board has 32 DDR4 slots (commercially available servers from eg Dell max out at 24) and it has 2 NVLINK slots! (!!)
Those NVLINK slots are what Intel should be worried about, because that's where Google is prepared to pay money. They are building computers that lock themselves into NVidia and doing it gladly.
Intel better find a way to compete with NVidia on deep learning.
NVLink is a communications protocol developed by Nvidia. NVLink specifies a point-to-point connection between a CPU and a GPU and also between a GPU and another GPU. NVLink products introduced to date focus on the high-performance application space.
and [1]:
NVLink – a power-efficient high-speed bus between the CPU and GPU, and between multiple GPUs. Allows much higher transfer speeds than those achievable by using PCI Express; estimated to provide between 80 and 200 GB/s.
What's the current state of Power development in the Linux kernel like? I thought it was only IBM holding the fort (via ozLabs) but this could be a big boost.
Why does Facebook's Open Rack use a nonstandard rack size? That seems like an obvious barrier for adoption of hardware that was designed to be a commodity.
Interesting what is the issue with 48V, I saw equipment for it seemed to be overpriced. Remember pricing out some stuff and as soon as 48V option for power came into play then price rose quite a bit.
Or is it that voltage is not high enough to be efficient for a large data center?
The voltage is high enough. The way Facebook's solution is, is you have triplets of racks: left, right racks hold computers, middle rack holds network, power distribution, and UPS.
The computers have extremely simplistic power supplies that basically can't fail, and just DC->DC transform from 48v to 12, 5, and 3.3; and the large scale power supplies that convert three phase 240v (or whatever you're supplying it with) from the datacenter to 48V are much higher efficiency than the ones that would have been in the server (which you would have fed them, usually, something like single phase 208v).
Redundancy is supplied by just hooking multiple transformers in the middle rack to the + and - terminals on each PSU, instead of a convoluted multi-module redundant PSU (which always uses a single backplane, and backplanes in redundant PSUs fail surprisingly frequently).
The total round trip efficiency of this system is about as high as you can realistically get. 80Plus Titanium is 90-95% efficient (depending on load), but has efficiency losses in rack level distribution which 48V tries to correct.
However, 48V DC can be very dangerous to work with, and a lot of tech workers refuse to work with it. Now, if you believe it is dangerous or not (I've seen arguments stating that it is no more dangerous than single phase 208v) is immaterial, this is the opinion of a lot of workers.
The cost of 48V gear is expensive if you're not in a datacenter already setup to handle it. Facebook obviously doesn't have this problem because they build entire datacenters from scratch.
I personally don't believe in it because it doesn't buy me anything that single phase 208v doesn't give me, I do not pay enough in electricity to have the overhead of dealing with it.
It is not the voltage it is the high number of amps that DC systems have.
While you could get fried from a single rack server's failing power supply, it is likely that it will die of some other cause before zapping you if you take simple precautions.
DC on the other hand, puts "the forces of nature" up close and personal to the back of the rack.
A wrong move will vaporize the thickest of metal screwdrivers and can easily do irreparable damage to a human.
Something as simple as not removing your wedding ring can result in an inadvertent bridge between + and - ; after which, bad things happen.
Correct, 48VDC exchanges volts for amps. Basic electrical theory states volts * amps = watts. We measure computer power usage in watts for a reason.
48VDC isn't sufficient to vaporize a metal screwdriver at the amperages used in most 48V datacenters, however, I know people who work with higher voltages than that, and they own expensive ceramic tools just for safety reasons. If I personally ran 48VDC, and my workers requested ceramic tools, I would not hesitate to expense those.
And yes, if you're working around DC >12v, you should be seriously using every method available to make sure you don't become a conductor, including only using one hand at a time to touch power relays (to avoid having your heart stopped).
48V danger is debatable, but it might be a good idea to start adding RCD protection to the power sources (yes, this will bring your rack down (or just one output), but better than having people getting stuck to wires or tools causing short circuits)
Is this possible? Fault currents are much more obvious at a higher voltage than lower, and I thought most RCDs would not be made to be sensitive enough to catch what might be a dangerous DC leakage.
Yes (not sure if there are existing products), because they work based on the difference of currents leaving and returning. (between 48V and 110V there isn't an insurmountable difference in detection capability needed)
Theoretically you should have zero leakage, and you should also trip on bigger currents (but a rack wouldn't use more than 10A maybe?)
Swapping racks may not be possible if you don't own the racks. Lots of datacenters are built with their own racks. You might be able to ask for extra deep racks but that's it.
It isn't about rack depth, but rack width. They are deep, but not unusually so.
Datacenters are not "built" with racks. They are not permanently affixed to the floor. Most datacenters do not keep empty racks on the datacenter floor, and keep the floor open.
If you are renting by the rack, no, most datacenters won't swap racks for you. You'd need to be renting entire cages for them to consider it.
Open Rack isn't really useful for small scale providers, it's more use to hyperscale companies like Facebook, Google, Amazon, and Microsoft. I don't think anyone we'd consider "medium scale" has adopted it (if I'm wrong, I'd love to see a story hit the front page about it).
Only one of those companies have adopted it, the others have considered it and, although they manufacture custom hardware and thus could take advantage of it easily, they have not done it.
I think Amazon does not want to use OpenRack, not sure if they consider it. I think they are in the position to negotiate good price from normal x86 server vendors or make them produce a modified version of a normal server (without the things Amazon does not need). I am curious what is up with them nowadays.
The Open Rack’s equipment bay has been widened from 19 inches to 21 inches, but it can still accommodate standard 19-inch equipment. A wider 21-inch bay, however, enables some “interesting configurations”, like installing three motherboards or five 3.5-inch disk drives side-by-side in one chassis. The outer width of the rack has remained a standard 24 inches to accommodate standard floor tiles.
They are going up against the coming Xeon E5 Broadwell + FPGA.
Power9 do offer more memory per Rack. But I dont see how Intel cant adopt with better memory controller.
To simply put, what are the incentive to switch over to Power9 platform?
Why? Intel can compete with Zen more easily it can also always rebound even if Google does go with Zen.
But if Google switches to an entire different eco-system dragging it back into x86 won't be easy because all of their platforms are built for a completely different architecture.
Next generation uArch from AMD. Which promise 40% IPC improvement . Again it won't be as fast as Intel, but at least AMD is within Reach for a price war, where as now even if AMD is 50% cheaper it makes little sense to use them.
So this raises all sort of questions: Can Intel can be fast enough in integrating Altera(sw+hw+corporate...) ? What is the better FPGA development environment, with more developer share, etc ? FPGA's can be cannibalistic to Intel's business - will they have an incentive problem ? Do some companies(say in china) prefer an open processor, like POWER, and this will create some ecosystem advantage ? Are there any advantageous startups to buy like kandou-bus(faster interconnect) and who will buy them ?
Yes, and I think Intel is not certain to win, just much more likely. The Power9 is here is targeting 2H 2017 release. Which is actually up against Intel Skylake/Kabylake Xeon Purley Platform in similar timeframe.
Purley Platform, Skylake Xeon offers:
Up to 8 Socket and 28 Core per Socket
6 Channel Memory Controller, 12 DIMM per Socket
Support of Intel Xpoint NVDIMM
48 PCI-E Gen3,
OmniPath 100G Connection
Offers up to 1.5TB Memory on a 2S Server, or 6TB Xpoint.
If you push to the limit of 128GB TSV DRAM and 512GB Xpoint, that is potential of 3TB on 2S Server and 12TB of Xpoint.
Not to mention Intel's Network Controller. The Whole ecosystem from Intel Cloud is actually quite amazing. Both from Hardware innovation and Software Compiler they are working on. It is the same lock in as the PC Windows industry, and unless you get a dramatic new way of doing things. You cant simply switch the Mobile Industry to x86 or vice versa all by yourself. Even if you are as big as Google. Then you get 10nm Intel Server in 2018/2019.
Again, I dont see the incentive making the switch.
I don't have the link handy, but within the last month or two there was a story here on HN about a company (Facebook?) that determined it was better to go small on hardware. Which is to say that one or two socket servers were more of a sweet spot than 8 socket monsters, at least for generic loads.
That was Facebook. Again backing up the same reason why ARM didn't manage to penetrate Server Market yet. The story was about Web Server which needs not a lot of CPU power, decent amount of memory and good networking. When anyone think of low power CPU they immediately think ARM has a fighting chance here. Before ARM even got a foothold in the Server market Intel responded with an Atom Server CPU. It turns out the market needed a more powerful CPU then they thought. Intel came again with Xeon-D, with has 2 to 8 Core, and integrated 10gbps Ethernet support at a low power consumption. It was an instant hit and it is now selling like hotcakes. They have recently updated with even 16 Core. You can stack 8 - 10 of these in 2U Microblade.
Again when you take into account of power, performance, and other parts of the Server components, the TCO of CPU is relatively small. Even if you gain 10% of TCO improvement, you have to factor in the future roadmap of the CPU, as well as the Software development, compiling and testing cost involved.
Only to the extent that you can afford to spend $1 million+ and a year every time you change your algorithm. For bitcoin mining or encryption or decoding popular video formats then yes, ASICs are absolutely the way to go. But there are many cases where the algorithms you're using aren't so fixed or where you're not willing to put up with such large lead times.
If one has the financial option to diversify, one would be wise to use x86, ARM and POWER at the same time. There aren't many examples where monoculture has been beneficial to anyone but the artificially selected culture.
Yes, First design will be out this year. A simple Google Search should bring your lots of info. It will be separate die but on same package. A true integrated single die solution is on track for 2017.
Power8 or Power9 has better memory controller then Intel Xeon. More memory channel, higher bandwidth, and higher memory capacity.
would be interested to hear more about this IO issue. Am I right to assume that because of the genesis of the X86 architecture in desktop computing, it is not optimized for server-class IO, and that this permeates the design (ie difficult to catch up with a ground-up server architecture)? If that's true then this is a big deal for Power. Certainly my big-data workflows are usually memory-IO bound, not compute bound.
I wonder if the inclusion of NVLink in Power 8+ will cause Power to excel in ML applications. It could well be quite a bit faster than x86 just due to the memory/interconnect bandwidth.
NVLink and CAPI[1] both have huge potential for machine learning. However, a lot of the benefits of NVLink for ML come from GPU-to-GPU NVLink, which doesn't require CPU support.
1. CAPI doesn't seem to get mentioned to much around here, but imagine an FPGA directly accessing some shared system memory. It's neat.
Yeah, it's neat. (I work on stuff that exploits this). We open-sourced the software side of our first flash IO accelerator last year. [1]
You can do some pretty cool things from a HW designer's perspective inside the accelerator, and in the main application. Since the accelerator is cache-coherent, and able to map the same virtual addresses as a given process (and attach to multiple processes' address spaces) the device can do "simple" things like follow pointers, which used to require building a command / data packet, DMA'ing it to the device, and then waiting for a response packet. This, effectively, frees up the main CPU to do other things, rather than wrangle data. It also means that bottlenecks move.
Correct; the CPU and the end-point accelerator both must cooperate to negotiate the CAPI link.
Disclaimer: I work on this with some very smart people @ IBM. Opinions are my own.
When a PCIe device is in CAPI mode, the PCIe protocol is used as a transport layer, but the CAPI protocol rides on top, and hardware in the CPU's PHB (the CAPP unit) and hardware in the accelerator (the PSL in this case) cooperate to present the common address space to the process and to the accelerator itself. [1] If a CAPI-capable card's plugged in to a non-CAPI-capable slot, it remains a PCI card. If a non-CAPI card's plugged in to a CAPI-capable system, it remains a PCI card. If both sides match on protocol versions and the kernel contains the cxl driver, the kernel will switch the slot into CAPI mode, and the CAPP unit and PSL effectively take over the PCI link on either side.
Thinking about this some more no way would the latency of PCIe ever be as big as that of paging in some memory from disk. So page faults can't really make this more important in Pascal.
NVLink is similar to a higher-bandwidth PCIe connection, except that multiple GPUs can be connected with it. It's primarily useful for very large convnets, which use a lot of memory and can be bandwidth-limited. It doesn't require any particular modifications to a model or framework to take advantage of it.
CAPI is much more flexible and interesting. It allows a CAPI-capable connected device access to a process's virtual memory. Essentially, you can extend the CPU's capabilities with CAPI. Usually this would be an FPGA (and the utility of FPGAs for machine learning is very much a research topic), but I could easily see a DSP being useful for voice recognition. GPUs can take advantage of it too, but ML work is usually just offloaded entirely to the GPU.
CAPI is very very cool and designed by some very smart people. I'm excited what people will do with CAPI and FPGAs.
CAPI allows an FPGA connected via PCIe to be treated as a coherent peer to the CPU cores that is able to hold cache lines and also use address translation. Among other things, from the application programmer's perspective, the CAPI accelerator can basically be treated as if it were another thread, since it can use the application's virtual address space - the application can set up data structures in main memory and pass unmodified pointers to the CAPI card.
Thanks for the link. The papers mentions key/value stores. Would a valid use case for CAPI be something similar to a "flash cache" where the FPGA is not as fast as DRAM but still faster than NAND flash?
IBM reps love to throw around the "Google is switching to IBM" line. Can they possibly compete with IBM on price? Why isn't AMD trying to reach this market?
These POWER chips are an open design; thus, anyone who wants Google as a customer and has a chip fab ready to go (like, say, AMD) could, in theory just fab some up to sell to Google.
Maintaining a fab when you can't justify enough orders for chips to keep it running around the clock is expensive. As part of the spinoff they had penalties because they weren't purchasing enough wafers from GF anyway, but it resulted in less bleeding than owning and managing it as a subsidiary.
This is still largely at commodity prices / performance points. It's been quite some time since any of their hardware has looked consumer-oriented, but comparing this to what enterprises buy, it's apples and oranges.
ARM instruction set is pretty mature but the system architecture for servers is less so, IMO. I think there's little commonality for bootstrapping the various SoCs.
Yes, very hard to justify when a high-end 4-core Xeon outperforms on some benchmarks[0] and costs half as much or less. Not to mention vastly mature open source floating point, compiler, etc, support. As well as existing (invalid, but still) programs that assume x86isms.
Most POWER8 stuff does seem to be in the "contact us" category. The only reference point I know of is that Tyan has (had?) a reference platform for 2850 USD, but that's basically beta quality and not intended to be production hardware.
It will be great if Dell / HP / Cisco / Lenovo others start forging some POWER gear ..
For a platform to succees, it should provide low barrier of entry - may be a low capacity P9 system at a low cost ~ 1K ? Will be a better strategy for OF.
I know Power is targeting cloud computing applications, but IBM should consider low cost entry level gear for getting some market share at the lower level which can transition to higher margin markets .
Here's the truth:
Google uses lots of compute power (insightful!)
Google isn't shifting to Power.
Google does have an active R&D program looking at Power.
TheNextPlatform misses the whole point here: That Zaius board has 32 DDR4 slots (commercially available servers from eg Dell max out at 24) and it has 2 NVLINK slots! (!!)
Those NVLINK slots are what Intel should be worried about, because that's where Google is prepared to pay money. They are building computers that lock themselves into NVidia and doing it gladly.
Intel better find a way to compete with NVidia on deep learning.