When I still read architecture docs like novels, there was a group experimenting with processor-in-memory architectures. Where the
demo was Memory chips doing vector processing of data in parallel.
I wonder how wide SIMD has to get before you treat it like a CPU embedded into cache memory.
Though I guess we are already looking at SIMD instructions wider than a cache line…