It's hard. The author has reinvented the architecture used by Cell (PS3) and Int...

matt4077 · on Jan 15, 2017

I'm not sure if those architectures are comparable to the one discussed in the article, except that both are highly parallel. GPUs and Cell are, as you mention, data-parallel.

The article talks about a much more "anarchistic" parallelism where thousands of different (in code and data) objects are each doing their thing, sending messages to each other when necessary. I guess that Erlang/Elixir's threads are closest currently, as mentioned in the article.

ianhowson · on Jan 16, 2017

Cell's SPUs and IXP's microengines aren't data-parallel any more than a regular CPU. They're minimal CPUs with local RAM and fast connectivity between each other (usually a FIFO queue).

Every single one of the CPUs was independent and happy to run as many branches and/or memory accesses you want without significant performance penalty, unlike modern GPUs.

So yeah, you could put different objects on different CPUs if you want. Except that that's not where the bottleneck in either energy or computation is. Remember that that local RAM needs to be powered if it's to retain state (ignoring FeRAM), so CPUs are no longer free; you have to commit objects back to main DRAM before switching off the CPU. And so you've just reinvented caching and might as well just run on a fast few-core CPU anyway.