The article say Power9 chip will have "hardware assisted garbage collection" and that's all, that's a bit short for such an important topic..
Does anyone have more information about it?
I came here to ask the same thing, did some searching, and still haven't found anything.
Edit: Found this in PowerISA_V3.0.pdf
Load Doubleword Monitored Instruction: Specifies a new instruction and facility that improves the performance of the garbage collection process and eliminates the need to pause all applications during collection.
The ldmx instruction loads the same doubleword in
storage as the ldx instruction.
ldmx is intended for use by applications when loading data objects during times when objects are being moved as part of a garbage collection process. In this type of usage, all loads of object pointers by applications are performed using ldmx. Whenever objects within a given region of memory are to be moved, the garbage collection program sets the Load Monitored Region registers to encompass the region being moved, and sets BESCRGE LME to 0b11 to enable Load Monitored event-based branches.
I think it refers to idmix instruction and Load Monitored Region, and Load Monitored Region Registers
>The Load Monitored Region registers are used to specify a Load Monitored region, and to specify the sections
of the region that are enabled.
The Load Monitored region is a contiguous region of
storage specified by the Load Monitored Region register. Load Monitored regions range in size from 32 MB to
1 TB. All regions are powers of 2 in length and are
aligned (see Section 1.11.1).
The Load Monitored region is divided into 64 sections
of equal length, each of which can be separately
enabled. The Load Monitored Section Enable Register
specifies which sections are enabled.
>Load Doubleword Monitored Instruction: Specifies a
new instruction and facility that improves the performance of the garbage collection process and eliminates the need to pause all applications during
collection.
Programming Note:
>The ldmx instruction loads the same doubleword in
storage as the ldx instruction.
>Idmx is intended for use by applications when
loading data objects during times when objects are
being moved as part of a garbage collection process. In this type of usage, all loads of object pointers by applications are performed using ldmx.
Whenever objects within a given region of memory
are to be moved, the garbage collection program
sets the Load Monitored Region registers to
encompass the region being moved, and sets
BESCR GE LME to 0b11 to enable Load Monitored
event-based branches.
>Subsequently, if an application program loads
(ldmx) a pointer into an enabled section of the
Load Monitored region, an event-based branch
(EBB) will occur. The EBB handler will load the
instruction from storage, and decode the instruction
to determine the effective address from which the
pointer was loaded. The handler will then load the
pointer (obtaining the same value the application
obtained), and determine where the corresponding
object has been moved to, and update the pointer
in storage so that it points to the object's new location. The EBB handler may also take other actions
such as updating additional pointers, depending on
the situation. After this processing is complete, the
EBB handler sets BESCR LME LMEO to (1 0) since
the taking of the EBB set these bits to (0,1), and
then executes rfebb 1 to re-enable EBBs and
return to the application program at the ldmx. The
application program re-executes the ldmx, which
now returns the updated pointer (which no longer
points into an enabled section of the Load Monitored region), and continues.
>Other variations and extensions of the above pro-
cedure are also possible.
Oracle did this in the past 2 generations of their UltraSPARC architecture.
At it's core a `SELECT FROM` statement is a filter lambda (without any joins, or before/after a join has been done depending on query optimization).
Fundamentally this is because an SQL table is a flat array. With row's being flat array's within the table's memory space.
Calculating the pointer to fields within rows within a table is ridiculously parallel. You're limited only by SIMD size. (this is 1 FMA + 1 add against a constant ~2 ticks).
Also custom wide SIMD registers can allow for `VARCHAR(255)` equality statements to be done in 1 processor cycle.
I guess I shouldn't be surprised that when a database company swallows a CPU hardware division that the CPU might sprout some more database specific operations, but I was.
I would love to see some benchmarks of how much this speeds up some sample workload queries, if you happen to know of some (preferably not from Oracle's marketing dept.)?
Why am I not surprised by this. I sometimes find myself thinking I should check my bias against Oracle, just to make sure I'm not being too hard on them for past transgressions (for example, Microsoft is far from perfect but they've come a long ways from many of their more negative past practices). Then I learn something like this, which while it doesn't directly mean they are still continuing in predatory marketing and sales practices, doesn't exactly assuage any fears I might have have had.
I currently work for a company that heavily uses Oracle technology. It's horrible. Our culmative contracts are worth several hundred thousand dollars per years.
But they'll still cheat us on small things. We request information on a training seminar (we had a free attendance voucher), and they lose our account information until 1 week after the seminar ended.
It's to the point where I just laugh. Everyone knows their cheating us, but upper management won't change.
Yes if you shell out for all the money to run it on every core, and use a lot memory it is solid. But the bang for your buck, Microsoft SQL has a better support plan, and saner licensing model. My experience is working with smaller datasets <200GB so YMMV.
I mean I'd rather use Postgres. But large corps like having a vendor to call.
Check out EnterpriseDB. I don't know how good they are in practice. But their speciality is solving the exact problem you mentioned with Oracle vs Postgres.
That's interesting. For some reason, I was under the impression Oracle had basically killed off the hardware development portion of Sun. I don't recall where I got that idea though.
Good to see new developments in this line. The more (open) hardware platforms the better, IMO.
They killed the desktops. That, essentially, made SPARC mostly invisible to most people. They also do a crappy job of reminding the world that a software company (as they position themselves) can make decent hardware.
Ok, that makes two of us that read a pile of SW features in HW and thought about the 432. I think this is more like the i960, though, where a fast RISC gets some extra features. Always thought it might have succeeded if not for the BiiN and Ada crap. IBM's been playing it smarter with POWER by making sure its compatible with and aids the software people actually use. So, POWER9 has a much better chance than Intel's prior attempts.
The power ISA specs tend to omit quite a few details about the hardware itself. I wouldn't be surprised if documentation was internal only and the first time we get any details about it is when IBM patches some open source software to take advantage of it.
They do respond to requests, albeit quite slowly. I wanted documentation to go with some kernel patches they posted to the linux kernel (IOMMU related), that hinted at a mode I was interested in. It took them two years but the docs showed up a few months ago. I suspect I wasn't the only one asking for more detail about system level (or firmware/hypervisor) features that aren't documented in the instruction arch. I think they understand both at the top and the bottom, that the time for hiding stuff like this is over. Pushing it through the immense bureaucracy still takes time but support from upper mgmt eventually gets it done. The one thing IBM has always been good at is documenting their hardware/software. Somewhere in the late 90's they started to hide much of the lowlevel documentation (like the rest of the industry) but I think the slow realization that doing that doesn't help them (or anyone else) is slowly becoming apparent as the most successful companies are frequently the most open ones (and therefor have better linux/open source support).
The Reduceron (an FPGA core for running pure functional programs) contains a very simple hardware garbage collector[0]. Almost definitely not the same sort of thing, though.
So, this is the chip that Google, Facebook and Rackspace used in their design for the open compute platform[1][2]. Maybe the entry level price will drop when like those are buying them (or their components) at scale.
Don't forget the Raptor Workstation. Allegedly, they plan to sell the thing for less than what SGI and Sun sold their boxes in the past. Using POWER8-9 chips in a market where ebay's best deal is $4,000 for a POWER8 CPU. ;)
ISA 3.0 adds a new instruction 'darn' -- Deliver a Random Number. That's a pretty good asm mnemonic :) I wonder if anyone has dug into the details of how that works yet?
"IBM has a well-known disdain for vowels, and basically refuses to use them for mnemonics (they were called on this, and did "eieio" as an instruction just to try to make up for it)."
I'm quite naive regarding Power processors; does anyone know where I can see any motherboards with prices for power processors, or even if any of the BSDs run on them? It seems like it would be quite a nice platform/OS combination that side-steps the traditional Apple/OSX vs Intel/Linux workstation options
In he case of the IBM Power processor, it'd be difficult to find the individual pieces. The computers are typically sold as prefab units by IBM or a third party. Not much of a build-it-yourself market.
That said, they're expensive like any high end datacenter-grade hardware.
OpenBSD, FreeBSD, and NetBSD all have support for the older Apple G4 and G5 hardware. Obviously being older they are kind of slow (especially compared to amd64) but if you just use them as file servers they are fine. The biggest problem with BSD on PowerPC, in my opinion, is the lack of modern programming languages. I believe on BSD you are stuck with C, C++, ruby, python, and perl mostly. No node, no Java, no golang, no rust. If you use Debian you have access to Java.
What? Why wouldn't you have access to node? Source is available for those, and node even specifically addresses the BSDs[1]. I heard go does something kind of odd with system calls, so I could see if possibly having a problem (but I really don't know enough to state definitively).
I'm not sure if rust targets that platform/arch combo, but I could see it being deferred while they focus on more common ones.
Edit: It just occurred to me that Node's (V8's) optimized JIT probably uses ASM or at least routines optimized for x86, so that would complicate deployment.
Yeah, v8 is pretty famous for its lack of interpreter, though it looks like IBM is working on a port. You could also use the Chakra or SpiderMonkey ports.
I figured. Although this isn't PowerPC, but Power, and what little I was able to find quickly seems to indicate the instruction set has evolved since it was branched for PowerPC. I imagine even if PowerPC targeted binaries work on Power, they wouldn't use the platform to the fullest.
The Go port is ppc64le, the new ABI (elf v2) which is Linux only at present, the BSDs support the elf v1 ppc ABI, which is a bit different and would need a different port, although it is mainly endianness and how function pointers work.
As others have said s390x is nothing to do with Power.
I had thought s390x could be based on POWER8/9, but you're right, that doesn't appear to be the case. I apologise. Regardless, Go does consider S390x to be a separate port.
Yah, most of the descriptions (just a little extra decode, aka PowerPC AS), and sharing the hardware actually apply to the iSeries, which _IS_ POWER based. But of course that is a minicomputer not a maninframe...
It seems the vast majority of people who consider themselves knowledgeable about computers are just massively ignorant/confused by IBM's product lines and capabilities (and don't even get me started about the nonstop and openvms). Which is a bit of a shame, particularly in the case of IBM i, which is probably one of the most interesting OS's still in support/production.
Java is now available for BSD on powerpc64. A set of patches were developed to add ppc64 support and have been officially committed to the bsd-port of openjdk8.
Node also supports powerpc. Support was added courtesy the team at IBM.
I have both working on my PowerMac G5, running FreeBSD 10.2.
FreeBSD runs on modern Power I believe, but the other BSDs only run on older hardware AFAIK, although IBM is pretty good at providing hardware so that may have changed.
When POWER8 came out in 2014 I ran a couple of numerical benchmarks against Intel Xeon CPUs. In my benchmarks the POWER8 CPUs where slightly faster than Intel Xeon CPUs when the data fitted into the CPU's cache, so far so good (the speed advantage of the POWER8 CPU can easily be explained by the much higher clocking). But once I started running heavy benchmarks involving gigabytes of data the POWER8 CPUs where at least twice as slow. Now 3 years later POWER9 CPUs will come out that are about 1.5 times faster, in my opinion this is not enough to compete against Intel. Why would anybody want to get locked into a rare and more expensive CPU architecture if there is no speed advantage?
The idea of the OpenPower thing is that you are no more locked in than you are on x86, perhaps not quite as open as arm64 but it's pretty nice in theory.
Also, I have a major sad for our industry when people struggle so much with alternative architectures. Tons of software absolutely should not care including JVM languages, scripting languages, Golang. Most userland compiled languages should also minimally care. For lower level C stuff, regular compilation on non-x86 archs is often directly reflected on overall code quality (new compiler warnings, things like alignment correctness, cache/memory coherency model assumptions, data type assumptions, etc). It is somewhat hard to port operating systems and "efficiency libraries" that invoke platform features, but it also takes a comparatively small number of people to do.
Most of the applications are heavily optimized for x86 processors for years. There are so many applications littered with intrinsics and x86 assembler coded blocks, algorithms are tuned for x86 and intel specific memory characteristics. Arm got some attention because of phones, but only in last few years.
Power8/9 are very decent processors with a lot of raw power, but it will take a more concentrated and long term effort to have a level playing field in the software landscape.
You might have issues with the quality of your compiler or you might be using POWER7 code and/or have VSX disabled. IME POWER8 is somewhat faster than high end Xeons (although not in terms of price/performance).
I compiled my code using the latest GCC version available, I even made sure I used the right compiler flags to enable e.g. hardware POPCNT support. I know that IBM claimed that POWER8 CPUs were faster than Intel Xeon CPUs but I found the opposite to be true (at least for the benchmarks I cared about). You can also find many other benchmarks on the web which are in favor of Intel Xeon CPUs compared to POWER8.
I am not against IBM POWER CPUs, I was just disappointed by POWER8...
Fair enough. I'm sure it's workload dependent, and also I have not tested huge data sets. I'm most interested in the overhead of virtualization and (separately) compile times.
I've developed for and used Linux and BSD on several platforms. Moving to a different one is trivial unless you're dependent upon proprietary binaries which can't be rebuilt. If you're using open code, or code which you have the ability to rebuild for the new platform, you're not locked in.
I don't know why, but I always had a thing for Power CPU's. Really curious how Power9 compares to Power8 and Intel Broadwell-EP/EX when it comes to power efficiency (and of course performance wise).
Power CPUs are also the only currently on the market that work with open source firmware from the ground up – everything else needs proprietary microcode and/or firmware blobs.
I don't think that is 100% true, you can bootstrap ARMs with ARM trusted firmware+tianocore. That doesn't mean that a large number of arm devices can be bootstrapped that way due to decisions made by the ARM licensee.
No, really. I get it, building your own CPU is fun. But they're all useless for the debate at hand, because nobody wants obsolete, slow, power hungry CPUs just because they're free.
at least one company want's this cpu [1], (but since one of the reimplementors of the superH architecture, or whatever you want to call them, works there - as CEO in fact - , take it with a huge grain of salt).
since you seem to know about the power consumption, can you please share some numbers about the power consumption of the jcore cpu (on a comparable process to other cpu's build for IoT devices, usb controllers etc)?
jcore is not intended to be a replacement for the main cpu, so this would be really interesting to me, and would save me from asking on the mailing list. ;P
I loved the PowerPC chips, the 604 and G5 in particular. Too bad it never became a viable platform outside of Apple.
(In the '90s, the PowerPC had very competitive price/performance, but there wasn't a viable desktop OS available after Apple terminated the original MacOS clone program. 5-10 years later Linux would have been much more ready, but the chips were not competitive with Intel anymore.)
I can't help wonder if Intel "won" by bruteforcing the lithography side. But this then made them "blind" to the reduced cost side of Moore's Law (or they tried for the longest time to get people to ignore it).
Their first reminder was perhaps netbooks, using semi-retired Celerons to make small and cheap clamshell web browsers and media players.
Their second, and ongoing, reminder is smartphones/tablets.
PowerPC essentially died as a desktop and server processor once Apple decided to go Intel. IBM decided to go back towards Power's roots with the POWER5 (the follow-on to the PPC970 in the Power Mac G5 and the PowerPC-based XServes).
PowerPC really died in 1997-98 when it became clear that Jobs wasn't going to license Mac OS (classic or X) for 3rd party PowerPC boxes anymore, and Windows NT on PPC wasn't going anywhere.
Without an operating system, IBM and Motorola didn't have an incentive to build PC chips anymore. The G4 happened because it was already in development and the vector extensions were useful for Motorola's embedded ambitions. The G5 happened because Apple basically paid IBM to make a desktop chip out of their POWER designs, AFAIK.
Around 2004, there was a startup PowerPC maker called P.A. Semi [1] that apparently competed for Apple's Mac CPU business. After Apple went Intel, they acquired P.A. Semi to design iPhone chips instead.
For large swaths of life the PowerPCs in a mac significantly faster than any available x86 CPUs. Back in the single core days, clock speed was something that lay user vaguely understood was correlated to performance. This had a subtle but real impact as Intel and OEMs were effective in marketing clock speed number as something users should care about. It was very effective until it backfired when AMD torpedoed intel with Opteron, and then clock frequency scaling became undesirable.
The idea that PPC was faster than x86 back in the PPC mac days is Apple hyperbole. There were a few occasions where the fastest PPC was faster than the fastest x86 available but they were far and few between and very slim. If all you looked at were the very carefully selected/tuned apple benchmarks you can be forgiven for thinking that (the photoshop filter benchmark using altivec comes to mind) there was a huge PPC advantage.
Here is a random google search for spec performance over time of assorted CPU arches.
The only obvious case in that graph where PPC>x86 is in mid 1995.
I can't find the one I saw a couple years ago detailing the core/clockrate comparisons of PPC's shipped in Macs with common PCs, that was much easier to read and far more obvious.
It wasn't just Apple hyperbole, PowerPC really did beat Intel at quite a few instances, and generally kept up until about 2004. AltiVec with photoshop was not just a tuned benchmark but a significant advantage on a common scenario. The Power chips also consumed significantly less power, enabling Apple to put far more powerful CPU's in laptops.
(It always confused me that my PowerPC 603 @ 75 MHz (a Performa 6200) was so, so much slower than my PowerPC 601 @ 80 MHz (a Power Macintosh 8100) – it wasn't until years later that I learned that the '601 was executing three instructions per Hz whilst the '603 was only executing two instructions per Hz...)
Indeed, performance per watt is the real question. They almost caught up with Intel with the Power8. If they manage to beat Intel with Power9, I think that would make a lot of folks take notice.
As much as I think the Wii, the Wii U and the Xbox 360 are all the equal-best gaming consoles ever (yes, I judge my gaming consoles purely by their CPU architecture), each of those consoles uses a custom chip developed under a specific IBM-{Nintendo, Microsoft} agreement, so they're somewhat different chips from the POWER[n] series :)
My dad got it while working at the university and at some point in time in high school gave it to me. The Lucite is pretty scratched up by now, but it has a weird sentimental value to me so I'm planning on getting it buffed out.
No, although I prefer the die shot from Power8 (probably because of better symmetry) [0], I like it as well. But nothing comes close to Nvidia GF100 for me. [1]
This is going to be the latest refinement of the POWER ISA, replacing the current POWER8 used by IBM, and licensed by some Chinese company for not yet released products.
It's a new revision of an existing ISA. POWER8 implements Power ISA 2.07, POWER8 is 3.0.
The Freescale part of NXP was formed out of Motorola's semiconductor division - Motorola, in turn, was part of the original AIM Alliance with Apple and IBM back in the early 90s that developed PowerPC. So they've been making PowerPC stuff since the very beginning. I would assume they have a license of some sort from IBM.
Note the FreeScale chips implement the embedded variant of POWER7 ("book E" IIRC). They won't run server software. In particular vector instructions are missing -- servers OSes like RHEL will crash very early in boot. And last I looked little endian support was missing or broken. Big endian is basically dead on modern server POWER although at Red Hat we still build both endians for now.
So realistically, does processor architecture matter today from the custkmer's point of view? I mean performance and all that matters, but presumably that will be offset by market forces. But for example I rent a couple of ARM servers from Scaleway as well as a whole bunch of x86-64 servers from various providers, and they all can do all the same stuff. So why is it that x86 is the chip data centers prefer when they could have a variety of chips and let the customers and the market choose?
Online.net, the folks behind scaleway you mentioned, do actually have a dedicated POWER8 server available. While their cheapest dedicated servers (C2350 Avoton / 4GB) go for €9/mo, and a bit more comparable 2xE5-2660v3 / 256GB at €320, you have to shell €1600/mo for the POWER8 / 512GB.[1] Which is to say, their other prices seem ok, but the POWER is still expensive.
Also, IIRC they've had POWER servers available for quite some time (possibly already POWER7), so maybe there is demand. Assuming so, it is either actually more efficient to a great degree than x86-64, or sufficiently more efficient so that it can solve some problems the x86-64 ones can't at all.
Small and medium size customers just want the cheapest architecture they can get, which is x86 or ARM fo servers and mobile, respectively. Large customers want leverage against Intel and ARM so alternative architectures are more important.
The only thing that matters in data centers is power consumption, so just as soon as these compete on compute per joule, data centers will roll them out.
Google/facebook datacenters maybe, but there are others where backwards compatibility is actually the most important metric because the companies in question have been investing in nonportable software for decades. Those places tend to be massively heterogeneous as different business merged/etc and absorbed the technology stacks or layered on newer technologies over older ones.
Yeah, absolutely. Your basic corporate in-house development shop using C# and SQL Server is married to Intel for all eternity. I think the Google/Facebook crowd is really the best market for this IBM stuff, because those companies control their entire stack and have the technical ability to port it.
Well, they released SQL Server for linux (even if pared down feature-wise), and that's probably a bigger hurdle than recompiling for a new architecture (depending on whether their compilation toolchain supports it, and how well), although there could still be some x86(_64) specific optimizations in place.
Sometimes weird stuff like supporting esoteric platforms happens because the companies in question are so large, that it makes sense to spend a few $100k on that project to make Intel squirm and get better pricing from Intel on your bulk deal. Keep in mind, Microsoft have their own cloud service...
I think the "scale" being discussed is semiconductor manufacturing, where the area matters and the unit count does not. Each of Intel's CPUs is between 100x and 10x larger than each of those ARM chips. For example a Xeon E5 has a die area of > 450mm^2, the Apple A9 is about 100mm^2, a Tegra 2 is ~50mm^2, and a Cortex is < 5mm^2. Those unit counts you mention are dominated by those smaller parts.
Do these IBM systems have anything like Intel ME that might be used by Big Bro for remote, covert rooting or subversion of the server? If yes on the standard model, can we order a version of these systems that lack that feature?
I'd love to see some market forces nudging Intel to offer CPU's and Motherboards without ME.
So, is the 24 cores 4 threads variant just the same as the 12 cores 8 thread variants but with each (ridiculously) wide core statically partitioned in two?
I don't understand why IBM is not creating their own cloud services like Amazon, MS or Google. They have talent, they make their own hardware and software they are struggling a little bit with this new software landscape so why not just do it?
Agreed. I wasn't looking for a buy button. However, an authoritative info page by IBM itself providing technical specs and details of the platform would earn them good credits at least amongst the tech audience.
Also likely because it's not targeted at consumers in the same way. Intel doesn't release products like hat. Dell and Apple do for their products that integrate Intel chips though.
I have no direct knowledge on IBM's processor, but from working on other processors, usually it's due to product yield.
Today's cutting edge fabrication of multi-core chips makes it quite unlikely to get many dies to manufacture where all the CPU's are working at high speeds. Chips must run at a single frequency, so the slowest frequency is the bottleneck. If you fabricated 32 cores, and 1 fails testing, the chip is garbage.
So the strategy is that the chips are fabricated with 32 cores, and then they speed-test the cores and take fastest 24 cores. The remaining 8 are fuse-disabled, and the end user sees a 24 core chip.
Note this strategy also allows them to produce a 16-core chip with the same die, if for example only 20 cores work at high speeds.
Yes, that's done some times but in this case, from the die picture posted here in another thread [1], it seems to have indeed 24 cores in 12 pairs (each of the "houses").
Symmetric doesn't mean 'power of two'. Quite a few ICs have had even numbers of cores on them (I'm not aware of any with an odd number of cores, but that doesn't seem like the end of the world either). Lots of other things have to fit on the die, too (e.g. unified cache, which takes up a lot of real estate in chips that have them)
slightly off topic, but for some reason your comment made me remember the time I had to do some regression testing of a fix backported to a really old version of our software.
We had to test against the OSs we'd since dropped support for, which included HP-UX, AIX and s390.
The first time I heard someone mention a "31bit" build, I thought they were taking the piss :P It just seemed so odd at the time...
IMO Xeon will be tough to unseat for performance. But a lot of server consumers only want to be able to scale ever-more guest OS/containers on a single platform, so these challengers do pose a real threat to Intel's server market.
Open source software has been a powerful economic force: the fact that so much software exists that's designed to target many similar platforms and is frequently recompiled for those targets means that the impact to the end-consumer of switching among x86/ppc/arm is much lower than it was in the past.
I don't know if they have to beat Intel on performance per core. There are markets that require the best performance per core, but there is also a big market that wants more reasonably powerful cores for a given price point.
For instance, in the VPS hosting market, you see most providers offer "vCPUs" (virtual CPUs), because then they can sell their customers "four cores" (or 20 cores) on the cheap, and so on. It's a good marketing strategy.
What if they can offer real cores for the same low price? That would be pretty appealing to their customers. They can also offer "24-core dedicated servers" on the cheap, and so on.
Also, in the long run, it would be best for hosting and cloud providers if they would at least adopt a strategy of 50% Intel chips, and 50% AMD + Power + ARM. That should get them much better prices in the future (from all the chip providers), if they did that. Monopolies don't serve anyone but the monopolist.
And maybe Mill Computing can follow through with the claim that their belt based, general purpose DSP CPU design can beat all other offerings at least with regard to power use. I wonder how long it will be until physical chips are out. Or at least new talks, they were really good.
I was enthralled watching those talks, and the concept of storing high-level bytecode for executables that gets targeted to the specific machine at runtime is a concept I've always found cool (I can install an ancient AS/400 app on our IBM i that was originally built on IBM's ancient 48-bit CPU the old systems had on our shiny POWER7 box as long as they didn't use any private API's).
However, we'll see what happens if and when they produce real silicon whether the concept will be another Itanic, even AMD moved away from VLIW designs for their GPU's for performance reasons ~4 years(though, to be fair, they don't exactly have the $$$ to pay for tremendous amounts of driver developers to optimize their code generation anymore).