You should instead look out for the 1024 core epiphany. It's a streaming processor so it may not perform as well on random memory access and there is no cache hierarchy but it's very close to your previous comment about "Something on the order of 256 or more 1 GHz MIPS, ARM, or even early PowerPC processors."
That's really cool, thank you! I would actually prefer a cacheless architecture like that because I don't think it really has a place in streaming or message-passing paradigms like Erlang or Go (it can still be relevant within the local address space of each process though, but I don't feel that the gain is worth it in most cases). Plus the problem space is still large so it might be better to let people discover alternative approaches to data locality like map reduce/sharding, copy on write and content-addressable memory.
I spent my teens writing blitters for shareware games and found that even then, the cache mostly got in the way. Processors like the PowerPC 603e had a pretty substantial cache miss penalty that was on the order of 5-20% for me depending on the situation. It was difficult to come up with appropriate cache hints for even relatively minor random access. I tried disabling the cache, but that made it even slower than a 601. So that's where my head is at, and the Epiphany sounds perfect. Here's a quick link for anyone curious: