And 4) how to handle problems that don't map well onto massively-parallel machine. Some problems / algorithms are inherently serial by nature.
Some applications like games decompose relatively easy into separate jobs like audio, video, user input, physics, networking, prefetch game data etc. Further decomposing those tasks... not so easy. So eg. 4..8 cores are useful. 100 or 1k+ cores otoh... hmmm.
True, but I'd expect for a chip like that you'd do what the parallella did, or what we do with CPU + GPU and pair it with a chip with a smaller number of higher powered cores. E.g. the Parallella had 2x ARM cores along with the 16x Epiphany cores.
Some applications like games decompose relatively easy into separate jobs like audio, video, user input, physics, networking, prefetch game data etc. Further decomposing those tasks... not so easy. So eg. 4..8 cores are useful. 100 or 1k+ cores otoh... hmmm.