The really interesting upcoming LLM products are from AMD and Intel... with catc...

YetAnotherNick · on March 21, 2023

> DDR5 DIMMS directly

That's the problem. Good DDR5 RAM's memory speed is <100GB/s, while nvidia could has up to 2TB/s, and still the bottleneck lies on memory speed for most applications.

brucethemoose2 · on March 21, 2023

Not if the bus is wide enough :P. EPYC Genoa is already ~450GB/s, and the M2 max is 400GB/s.

Anyway, what I was implying is that simply fitting a trillion parameter model into a single pool is probably more efficient than splitting it up over a power hungry interconnect. Bandwidth is much lower, but latency is also slower, you are shuffling much less data around.

virtuallynathan · on March 21, 2023

Grace can be paired with Hopper via a 900GB/s NVLINK bus (500GB/s memory bandwidth), 1TB of LPDDR5 on the CPU and 80-94GB of HBM3 on the GPU.

brucethemoose2 · on March 21, 2023

That does sound pretty good, but its still going chip to chip over NVLink.