Parallella: A Supercomputer For Everyone is Dying

aristidb · on Oct 26, 2012

Dying? At $612K of $750K raised with 31 hours to go? Is there something I don't understand there?

EDIT: Realizing that mods might change the title at any time, it is right now "Parallella: A Supercomputer For Everyone is Dying"

schiffern · on Oct 26, 2012

With 31 hours to go. If they don't raise $138,000 in 31 hours they get squat.

mtgx · on Oct 26, 2012

The title suggests they get nothing if they don't raise the $750,000 target, which is true.

vidarh · on Oct 26, 2012

Nearly $618k as of 16 minutes after your post...

EDIT: $624,602 44 minutes after your post.... I'm getting more optimistic.

Gring · on Oct 26, 2012

According to kicktraq, they're indeed trending towards not making it (trending to 651k, 86% of goal):

http://www.kicktraq.com/projects/adapteva/parallella-a-super...

Eduardo3rd · on Oct 26, 2012

They only have 31 hours left to raise the rest of the money. Considering they fact that they took several weeks to reach this point it seems unlikely that they will hit the goal, and therefore they will not secure any of the pledged funds.

vidarh · on Oct 26, 2012

It slowed down for a long time, but it's sped up again the last few days, and in less than the last 20 minutes they've gotten another $6k. It's going to be close either way it seems.

incision · on Oct 26, 2012

I seem to recall repeatedly reading that Kickstarter projects collect he majority of their funding in the first 72 and final 24 hours.

My anecdotal experience backing and following half a dozen projects agrees with this.

I'd be quite surprised if this project doesn't make it.

polyfractal · on Oct 26, 2012

Could someone explain how this is different from GPU computing and regular multi-core CPU computing?

I realize there is a difference...but I'm not quite sure I grasp it yet. GPU computing is a lot of parallel math computations with limited shared memory. I'm assuming the Epiphany CPU is more capable than the simple GPU math units?

How's it different from multi-core CPUs? Just the sheer quantity of cores they have packed in there?

DeepDuh · on Oct 26, 2012

I've done my master thesis on GPGPU, so maybe I can help out a bit. I'm not yet too familiar with Epiphany's design however. From what I could grasp what sets them apart the most is a different memory architecture compared to multicore CPUs, where the individual cores seem to be optimized for accessing adjacent memory locations as well as the locations of the direct neighbors. This is one point where the architecture seems to be similar to GPUs, although GPUs have a very different memory architecture again - for the programmer it might look similar however, especially when using OpenCL.

The main point where Epiphany is diverging from GPUs is that the individual cores are complete RISC environments. This could mainly be a big plus when it comes to branching and subprocedure calls (although NVIDIA is catching up on the later point with Kepler 2). On GPUs the kernel subprocedures currently all need to be inlined and branches mean that the cores that aren't executing the current branch are just sleeping - Epiphany cores seem to be more independent in that regard. I still expect an efficient programming model to be along the same lines as CUDA/OpenCL for epiphany however - which is a good thing btw., this model has been very successful in the high performance community and it's actually quite easy to understand - much easier than cache optimizing for CPU for example.

If we compare epiphany to CPU what's mainly missing is the CPU's cache architecture, hyperthreading, long pipelines per core, SSE on each core, possibly out-of-order and intricate branch prediction (not sure on those last ones). The missing caches might be a bit of a problem. The memory bandwidth they specify seems pretty good to me, but from personal experience I'd add another 20-30% to the achievable bandwidth if you have a good cache (which GPU has since Fermi for example). The other simplifications I actually like a lot - to me it makes much more sense to have a massive parallel system where you can just specify everything as scalar instead of doing all the SSE and hyperthreading hoops like on CPUs - optimizing for CPU is quite a pain compared to those new models.

varelse · on Oct 26, 2012

Assuming you're programming it in OpenCL, it's effectively a GPU with many more SMs but with a narrower SIMD width. If they were to give it, say, 16-way predicated SIMD with incomplete IEEE compliance on par with the Cell (~4M transistors per core plus a wider internal bus), it would become a very interesting processor IMO with ~1.4 TFLOPs per 64-core epiphany board. At the very least, they'd get bought out if they built such a beast and undercut NVIDIA, AMD, and Intel. Just sayin'...

In the meantime, leave the fast atomic ops, ECC, and full IEEE compliance to the GPUs and Xeon Phis of the world until you have the transistor budget to go after them...

All IMO of course...

vidarh · on Oct 26, 2012

> If they were to give it, say, 16-way predicated SIMD

I think that would completely defeat the purpose of the architecture, as it'd massively bloat the transistor count per core. Their roadmap is for 1000+ independent cores on a single chip, not stopping at 64 per board.

varelse · on Oct 26, 2012

And there's the problem: my personal bias from years and years of GPU programming is that I'd rather target 4 cores with 16-way SIMD than 64 cores each with scalar, or to quote Seymour Cray - "If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens?"

Besides, this is 28 nm technology and 15x15 mm, no? That's 225 mm^2. AMD's 28 nm Tahiti is 365 mm^2 with 4.3B transistors, making this thing ~2.7B transistors give or take or ~41M transistors per core. Adding 4M transistors (source: it's about 1M transistors on a Cell chip per 4-way SIMD unit) is <10% larger in exchange for 16x the floating-point power. Unless I'm missing something, I'd build that chip in a minute...

Which is to say I don't want 1000+ wimpy cores - it'll get smashed by Amdahl's Law - when I can have ~900 brawny cores. NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.

vidarh · on Oct 26, 2012

> I'd rather target 4 cores with 16-way SIMD than 64 cores each with scalar

You're assuming problems that are suitable for SIMD. If you have problems suitable for SIMD, use a GPU. Lots of problems are NOT suitable for SIMD.

If those 64 data streams all happen to require branches regularly, for example, your 4x 16-way SIMD is going to be fucked.

> Besides, this is 28 nm technology and 15x15 mm, no?

Where did you get that idea? Their site states 2.05mm^2 at 28nm for the 16 core version. 0.5mm^2 per core.

So by your math, more like ~26M transistors, or ~1.6M per core. Your estimated die size is 70% larger than what they project for their future 1024 core version...

Source: http://www.adapteva.com/products/epiphany-ip/epiphany-archit...

> it'll get smashed by Amdahl's Law

This is a ludicrous argument when arguing for a GPU architecture instead. A GPU architecture gets affected far worse for many types of problems, because what is parallelizable on a system with 64 general purpose may degenerate to 4 parallel streams on your example 4 core 16-way SIMD.

There are plenty of problems that do really badly on GPU's because of data dependencies.

> when I can have ~900 brawny cores

Except you can't. Not at that transistor count, and die size, anyway.

> NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.

Have they? Really? They've targeted the embarrassingly parallel problems with their GPU's, rather than even try to address the multitude of problems that their GPU's simply will run mostly idle on, leaving that to CPU's with massive, power hungry cores and low core count. I see no evidence they've tried to address the type of problems this architecture is trying to accelerate.

Myabe the type of problem this architecture is trying to accelerate will turn out to be better served by traditional CPU's after all, but we know that problems that don't execute the same operations on a wide data path very often are not well served by GPUs.

varelse · on Oct 29, 2012

Mea culpa on the die size...

That said, this is where the R&D done by AMD and NVIDIA have expanded what is amenable to running on a GPU. Specifically, instructions like vote and fast atomic ops can alleviate a lot of branching in algorithms that would otherwise be divergent. It's not a panacea, but it works surprisingly well, and it's causing the universe of algorithms that run well on GPUs to grow IMO.

What I worry about with Parallela is that by having only scalar cores, and lots of them, it has solved issues with branch divergence in exchange for potential collisions reading from and writing data to memory. The ideal balance of SIMD width versus cores count is a question AMD, Intel, and Nvidia are all investigating right now. But again, ~26M transistors - no room for SIMD...

DeepDuh · on Oct 26, 2012

There is certainly something to what you say. The advantage of the GPU model is that you can have the ALUs occupying a much higher percentage of your die if each core is less independent. Independent threads is not necessarily what you need on an accelerator card - that's what you have CPUs for anyways.

daniel-cussen · on Oct 27, 2012

Why plow a field with 1024 chickens, when you can plow it with 1M worms?

The GA144's F18 core has ~20 thousand transistors, and is asynchronous, and if you make the die size the size of an Opteron, and if you wait until you can pack 20B transistors on a die, you get---one million---cores.

varelse · on Oct 29, 2012

That chip would so cool if only its native internal representation were 32-bit... Sigh...

But it's way better than this monstrosity: http://web.media.mit.edu/~bates/Summary_files/BatesTalk.pdf

vidarh · on Oct 26, 2012

It's closer to regular multi-core CPU computing than GPU computing. It's general purpose cores.

What sets it apart is that the cores are tiny, with little per-core memory (though all cores can transparently access each-others memory as well as main memory), and so the architecture is well suited for scaling up the number of cores with quite low power consumption.

So for problems that can be parallelized reasonably well, but with more complex data dependencies than what a GPU is good for, this might be a good fit.

I'd put it somewhere in the middle between GPU's (for embarrassingly parallel tasks) and general purpose CPU's with high throughput per core.

Also, this looks like it'd be possible to fit in the power envelope of really small embedded systems, like e.g. cellphones and tablets....

Before more developers have these systems, it'll be hard to say how useful they'll be, but the architecture looks exciting.

That's why I supported it - I really want to see how this type of architecture can be exploited, and whether or not it'll prove to be cost effective and/or simpler to work with than GPU's for the right type of problems.

wmf · on Oct 26, 2012

IMO this combines some of the worse features of Cell (e.g. local memory and DMA) and GPUs, and while the power efficiency is good the absolute performance is very low. For a parallel noob who's using OpenMP/OpenCL I don't think it's any better than a desktop PC because programming it is going to feel the same and performance is going to be equal or lower. And if you don't use the libraries then you're in low-level ninjas-only land — the extremely simple and flexible hardware is good in theory because you can use it many different ways, but it also doesn't help you or give any hints about how to properly exploit it.

vidarh · on Oct 26, 2012

It's not meant to compete with a desktop PC, or with a mass produced GPU.

It's meant to be a development platform for solutions based on their architecture and for people to get familiar with the development model, with an existing 64-core version of their chip and future versions intended to put 1000+ cores on a board as the eventual target.

That it's also a reasonably capable platform to run Linux on (on the ARM chip) so you can do development directly on the board is an added bonus.

wmf · on Oct 26, 2012

If the architecture is not good then people don't want to get familiar with it. I'm skeptical that even an "eventual" version with hundreds of cores would be worth using.

vidarh · on Oct 26, 2012

Well, clearly at least 3700 people want to get familiar with it based on the number of backers so far, which is pretty good for a niche platform like this.

We'll find out soon enough.

qdog · on Oct 26, 2012

Low-level ninja reporting for duty.

Software router, possibly.

ArchD · on Oct 26, 2012

Typical multicore CPUs don't have nearly as many cores as Parallella. Also, from what I can from www.apteva.com/introduction, the power consumption is much lower and the interconnect is different. In Parallella, cores are laid out in a grid and cores can only talk directly to their neighbors.

kiba · on Oct 26, 2012

For some reason, I thought it relates to the concept of copying the brain of the dead to the computer so that they can live on as disembodied souls in the computer.

It just turns out to be a kickstarter project for a powerful computer.

vidarh · on Oct 26, 2012

It's not so much that it's a powerful computer, but a computer architecture that can scale up to be a very powerful system. The version they're trying to fund is a cost reduced version including their 16 core chip. They also have a 64 core chip, and plan to scale it much higher.

It's differentiated from GPU's in that each core is a simple but fully independent CPU core, with direct access to main system memory AND to the memory of the other cores.

This current project is most interesting as a means for people to start playing with the architecture rather than for the raw performance.

mtgx · on Oct 26, 2012

How is it different from Tilera and Intel's Xeon Phi?

lonetech · on Oct 26, 2012

I've not had the time to read up on Xeon Phi, but compared to the Tilera, the Epiphany is a considerably simpler processor. There's no MMU in the cores, instead of caches there is direct DMA control, and the on-chip network extends past the edges of the chip (that's all the I/O, there are no peripherals in the chip). It all adds up to something you can scale by mounting more of them on a board, assuming your task is sufficiently adaptable to a data flow (since the external bandwidth scales slower than the number of cores). It's not at a level where you can run a general purpose operating system with virtual memory and memory protection (though extending it for that would be fairly easy - perhaps Epiphany V?), nor does it (currently) run multiple threads per core, but this simplicity affords it a much lower power expense. A GPU may be more similar, as those tend to have prefetch operations and no memory protection, but they are designed to have huge bunches of threads doing the exact same type of work. They look like vector processors handling between 16 and 128 identical operations per control core (each a multiprocessor). Mainly the Epiphany is easier to program, but optimization is a different story (similar to place and route processes FPGAs need). It's a move toward a data and control flow granularity currently not available at a price for individuals. And to make it more useful, those individuals need to try things.

vidarh · on Oct 26, 2012

Vs Xeon Phi: Cost, complexity, power. Look at pictures of the Xeon Phi cards. They're covered in heat sink, and with a fan. For comparison, the Epiphany chips are a single tiny die with no cooling. But of course the per-core performance is not likely to be anywhere near Xeon Phi either.

I'd consider Epiphany the simple, "slow" (per core), low power solution, with Tilera somewhere in the middle, and Xeon Phi at the other extreme (complex, fast per core, high power usage).

That said, this is speculation based on reading articles - I've not had my hand on any of the three. Yet :)

HCIdivision17 · on Oct 26, 2012

I'm embarassed to say the same. I had forgotten about this Kickstarter, so I thought this was some sort of MRI-supercomputer backup plan, like cryostasis.

Personally, I'd prefer to blame my mental auto-correct. So used to seeing poor / simple grammar mistakes on the internet, I'm in the habit of simply reading in the missing words. In this case, the title wasn't "Parallela: A Supercomputer for Everyone Who is Dying". Which makes this less interesting, but at least this could be real. And from the looks of it, it probably will be.

jws · on Oct 26, 2012

At some point in the next 24 hours, the backers who signed up for one unit will need to ask themselves if they would rather have two units or zero.

I can't speak for the other 1800 people in my bin, but I just decided on two.

Evbn · on Oct 27, 2012

Isn't that (multiples as pledge awards) a direct violation of the letter and spirit of kickstarters new rules?

vidarh · on Oct 26, 2012

I'm happy to see it get more attention, but to say it is "dying" is a bit hyperbolic. Sure, the Kickstarter campaign seems like it's unlikely to meet its target.

But from the sounds of it I don't think the company behind it will just give up if that happens. I know for my part if they put up another campaign, preferably with a longer lead time, elsewhere and/or take pre-orders, I'll commit again and I'm sure a lot of the other people who signed up will too.

I think it was unfortunate that they didn't release all the material they've released in the last few days right at the beginning of the campaign, though - they'd likely have done better. They've also clearly had a hard time explaining to people what it's for, which is a pity. I don't think the 16 core version by itself is all that interesting from a performance point of view, but I'm interested in the architecture in the hope that they manage to pull of the 64 core version and larger.

EDIT: It's added $20k in the hour since I wrote this - happily it looks like it's got a good chance to succeed.

rrreese · on Oct 26, 2012

Several commenters and the OP, seem to think that this Kickstarter will fail. Having backed quite few Kickstarter campaigns, and watched a lot more, this seems unlikely.

Backing is concentrated very heavily in the first three days and the last three. Projects that have reached 80% of their funding goal by the last three days are extremely likely to succeed.

It seems that many people delay backing till the last minute. Possibly this is just human nature, though the Kickstarter process also means that as the project progresses more information is released in a steady stream, and often new funding levels are created.

Additionally backers who really want the project to succeed raise their pledges to help the project succeed.

mikepurvis · on Oct 26, 2012

And no big project like this would ever fail by < $50,000. Wouldn't the founders or their investors or whatever just max their credit cards to see it through?

Does Kickstarter explicitly prohibit such things?

mendocino · on Oct 26, 2012

> we see a critical need for a truly open, high-performance computing platform

> FAQ: Will you open source the Epiphany chips? > Not initially, but it may be considered in the future.

Well, that makes it a lot less interesting than I hoped it would be.

vidarh · on Oct 26, 2012

You were hoping for what exactly? VHDL/Verilog for the chips? Netlists?

For most people I'd assume the main thing is that the architecture is well documented and open, as well as the board, and they have released all of the architecture documentation and a lot of other material.

As much as it'd be great to have a market in other sources for the chips, unless/until the architecture has some traction that is pretty irrelevant.

GregBuchholz · on Oct 26, 2012

Has anyone played with the 144-core GreenArray's IC?

http://www.greenarraychips.com/

jws · on Oct 26, 2012

GreenArrays are intriguing, but a completely different animal from this computer. The GreenArrays compute nodes are microscopic by comparison. Think 256 bytes of storage, shared with instructions and data. If you can map your problem on to them they seem very efficient.

almost · on Oct 26, 2012

Only $99 for the first reward that actually will come with a board to play with. Sounds good to me! I've added $119 (international shipping!) to the total, hope they make it...

nilsimsa · on Oct 26, 2012

If the current trend rate continues, they should be able to reach their goal. If they could somehow get on the Reddit front page it would easily happen. I think there are many who might be interested if they only knew.

Here is the trend graph: http://canhekick.it/projects/adapteva/parallella-a-supercomp...

pbharrin · on Oct 26, 2012

I think the market is telling these guys: We don't care about computing power. People are getting by with iPads and Chromebooks powered by ARM cores with 1/8 the computing power of an Intel processor. Don't get me wrong if you want to play around with parallel computing you should love this, and support it. Just don't be surprised when it doesn't reach Pebble funding levels.

amalag · on Oct 26, 2012

They want to enable small scale computers to do more powerful computations, i don't think it is directly for ipad's and chromebooks, it could be useful for something like a quadcopter where there is a algorithm that gives better stability but needs more computational power.

wmf · on Oct 26, 2012

They're actually providing much less computing power than even a low-end PC.

converging · on Oct 27, 2012

It's alive. Alive! Adapteva reached their target; right now they are at $769,996 pledged with a $750,000 goal, which was cleared on october 27th between 2 and 3 am. Not sure what (US) timezone this refers to.

Source: http://canhekick.it/projects/adapteva/parallella-a-supercomp...), 13.700 projects graphed). Great project, Daniel.

Did not delve into past performance of kickstarter projects, but comments from across the net seem to confirm rrreese's comment: "Backing is concentrated very heavily in the first three days and the last three. Projects that have reached 80% of their funding goal by the last three days are extremely likely to succeed."

Canhekickit states several todo's, of which aggregates and prediction would be especially useful.

Any comments on how the funding dynamic of future kickstarter/other crowdfunding projects would be affected if this data would be available?

Eduardo3rd · on Oct 26, 2012

I'm impressed that they made it as far as they have. $612k puts them in the top tier of all Kickstarter projects, but unfortunately they look to have set their goal too high. Maybe they can pull a Clang and raise a ton of money in the last 24 hours, but I'd be surprised to see that happen. Here's hoping I'm wrong.

jivatmanx · on Oct 26, 2012

I think they will succeed.

They're selling something that's really cool and a lot of people want to see succeed, but for which there aren't any software applications to take advantage of yet.

That's the type of project where a lot of people could donate at the end so that it succeeds. Opposed to say, a game, where there's a ton at the beginning and than it slowly trails off.

damian2000 · on Oct 26, 2012

I think they'll make it easily, sort of like what happened to Dalton Caldwell's App.net ... they got a ton of support in the last 12-24 hours.

pi18n · on Oct 26, 2012

Maybe this is cynical, but there seems to be little reason for them to not borrow enough from friends and family to collect from Kickstarter, and pay them back immediately after. Unless the gifts are ridiculously expensive?

dkhenry · on Oct 26, 2012

While I am happy to see another post for this on the front page, I would have preferred a positive post. People jump on bandwagons I would rather we started a positive bandwagon rather then one looking to find the shovels and a decent grave for a awesome project

kemiller · on Oct 26, 2012

For what it's worth, this is the first I've heard of it, and this convinced me to contribute.

joshuafcole · on Oct 26, 2012

I did my part. I'm very much the archetypal broke college student at the moment, but I won't always be. I have big plans in the Artificial Intelligence and Machine Learning sectors, and I can't imagine a better, cheaper solution to get started on working with multi-agent systems.

I desperately want to see this sort of pricing for cluster computing available in the future, when I have the scratch and knowledge necessary to make these ideas into products.

I think that future is worth skipping the occasional movie or meal to pay into, and I'm looking forward to my somewhat unexpected end of year gift.

Evbn · on Oct 27, 2012

Here is one cheaper solution: simulate a multi agent system on your laptop. You don't need parallel hardware to ave multiple agents.

laacz · on Oct 27, 2012

I wonder, if this was elaborate reverse engeneering of hn crowd to reach Parallella financing target :)

bhickey · on Oct 26, 2012

Hasn't Amazon already done this with EC2?

(Edit: Written in response to the title.)

vidarh · on Oct 26, 2012

Huh? How does this have anything to do with EC2?

Parallella is an dual core ARM board with an FPGA and a 16 core Epiphany CPU (full general purpose CPU cores with 32k static RAM built into each core - all the cores can access the memory of all other cores as well as system RAM). 1GB RAM total. Expected size around a credit card or so.

The main purpose is the Epiphany CPU, which they also have a 64 core version of. Their problem is that their current CPU's are produced using a process that gives them very low yields and very high per-CPU cost. The main goal of the bounty is to enable them to switch to a much higher yield process and bring the per-chip cost of the 16 core version down to a few dollars per chip.

Their long term roadmap is boards with 1000+ cores.

wmf · on Oct 26, 2012

Yeah, the cloud is much more accessible "for everyone" than this chip. If you want to learn parallel programming rent a GPU instance.

vidarh · on Oct 26, 2012

A GPU has a totally different architecture and programming model. It is interesting for very different types of uses.

Evbn · on Oct 27, 2012

Who rents GPUs?

dannnnnnny · on Oct 26, 2012

Argh! I get paid in a couple of days! I was hoping the cash would go into my account before this runs out. Looks like that might not happen now.... Why couldn't you have given us another few days? Oh well. I bet it'll get funded.

Sami_Lehtinen · on Oct 27, 2012

Well, they made it and more: Funding: $788,138 of $750,000

eigenrick · on Oct 26, 2012

It's a shame they're not making something useless like a video game. Then they'd have millions.

lonetech · on Oct 26, 2012

Video games is exactly where heterogenous computers like this have flourished. But they don't have a game to market it yet - historically it hasn't even mattered much if that initial game used the platform anywhere near well. How did the Ouya sell, for instance? (It really resembles a Nexus 7 with a broken screen.)