Micron has been pursuing something similar for a while. The "CPUs" that you can put on the same die, however, are very limited in what they can do, owing to the specific lithography used on memory chips, and the general lack of die space. They also don't get uniform access to a huge memory range that you'd expect from a "real" CPU, and require you to partition work to fit within the constraints of the memory access pattern they can, in fact, support. The instruction set is very limited, floating point can only be emulated (i.e. slow AF, not that you actually need it for neural networks most of the time). The upside is the unlimited memory bandwidth and very low pJ/byte, with a few catches.
Don't know if this is similar, but if it is, it's going to be a hard sell, especially in the era when 90% of programmers can't even understand what I wrote above.
Don't know if this is similar, but if it is, it's going to be a hard sell, especially in the era when 90% of programmers can't even understand what I wrote above.