Hacker News new | past | comments | ask | show | jobs | submit login

It's hard.

The author has reinvented the architecture used by Cell (PS3) and Intel IXP. Both of these architectures are dead specifically because they are too damn hard to program for compared with a multicore ARM/x86 chip.

GPUs would be the most successful modern implementation of this idea. There are opportunities with FPGAs, but GPU silicon is so far ahead (and still advancing fast) that you're usually better off designing for GPUs.

You could also consider Cavium parts (16-64 way ARM chips) which ship today in high-end network hardware.

The common lessons across all of these are that:

* Memory is slow compared with computation

* Put caches everywhere and have the machine decide what to cache

* Synchronisation is hard and puts tremendous load on your scarce memory resources

* It's much easier to do the same job on different data a million times than to do a million different jobs on the same data. In other words, high throughput is easier to achieve than low latency.




I'm not sure if those architectures are comparable to the one discussed in the article, except that both are highly parallel. GPUs and Cell are, as you mention, data-parallel.

The article talks about a much more "anarchistic" parallelism where thousands of different (in code and data) objects are each doing their thing, sending messages to each other when necessary. I guess that Erlang/Elixir's threads are closest currently, as mentioned in the article.


Cell's SPUs and IXP's microengines aren't data-parallel any more than a regular CPU. They're minimal CPUs with local RAM and fast connectivity between each other (usually a FIFO queue).

Every single one of the CPUs was independent and happy to run as many branches and/or memory accesses you want without significant performance penalty, unlike modern GPUs.

So yeah, you could put different objects on different CPUs if you want. Except that that's not where the bottleneck in either energy or computation is. Remember that that local RAM needs to be powered if it's to retain state (ignoring FeRAM), so CPUs are no longer free; you have to commit objects back to main DRAM before switching off the CPU. And so you've just reinvented caching and might as well just run on a fast few-core CPU anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: