... how high did you expect CPU core count to be at this point with close to uniform memory access costs, and what changes did you expect to enable that degree of concurrent bus access without catastrophic stalling due to contention?
I was thinking of GPU cores having random write access, but yes that has concurrent access problems.
BTW how do GPUs solve the concurrent bus access you mention, with many cores are writing pixels?
Also, is this GPU-style processing, where each core writes a separate result (a texture/display pixel or transformed vector) but can read randomly (from textures), the only approach to parallelization that works? It's a form of scatter-gather.