Hacker News new | past | comments | ask | show | jobs | submit login

That's because on (most) modern systems, main RAM is combined (shared memory space) and external from CPU, connected through fat but high-latency pipe.

A solution is to include RAM with each CPU core on-die. Afaik this is uncommon approach because semiconductor fabrication processes for RAM vs. CPU don't match well? But it's not impossible - IC's like this exist, and eg. SoC's with integrated RAM, CPU caches etc are a thing.

So imagine a 'compute module' consisting of 6502 class CPU core + a few (dozen?) KB of RAM directly attached + some peripheral I/O to talk to neighbouring CPU's.

Now imagine a matrix of 1000s of such compute modules, integrated on a single silicon die, all concurrently operating @ GHz clock speeds. Sounds like a GPU, with main RAM distributed across its compute units. :-)

Examples:

  GreenArrays GA144
  Cerebras Wafer Scale Engine
(not sure about nature of the 'CPU' cores on the latter. General purpose or minimalist, AI/neural network processing specialized?)

Afaik the 'problem' is more how to program such systems easily / securely, how to arrange common peripherals like USB, storage, video output etc. in a practical manner as to utilize the immense compute power + memory bandwidth in such a system.




> Now imagine a matrix of 1000s of such compute modules, integrated on a single silicon die, all concurrently operating @ GHz clock speeds. Sounds like a GPU, with main RAM distributed across its compute units. :-)

Sounds like a new version of the Connection Machine (the classic / original one).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: