512 MB of cache, wow. A couple years ago I noticed that some Xeons I was using h...

zamadatix · 2024-10-12T18:25:57.000000Z

CCDs can't access each other's L3 cache as their own (fabric penalty is too high to do that directly). Assuming it's anything like the 9174F that means it's really 8 groups of 2 cores that each have 64 MB of L3 cache. Still enormous, and you can still access data over the infinity fabric with penalties, but not quite a block of 512 MB of cache on a single 16 core block that it might sound like at first.

Zen 4 also had 96 MB per CCD variants like the 9184X, so 768 MB per, and they are dual socket so you can end up with a 1.5 GB of total L3 cache single machine! The downside being now beyond CCD<->CCD latencies you have socket<->socket latencies.

edward28 · 2024-10-12T22:54:14.000000Z

It's actually 16 CCDs with a single core and 32MB each.

nullc · 2024-10-13T17:29:02.000000Z

9684x is 1152 MB cache per socket, 12 CCDs * 96MB. A similar X series zen5 is planned.

Though I wish they did some chips with 128GB of high bandwidth dram instead of a extra sized sram caches.

bee_rider · 2024-10-13T02:52:57.000000Z

Hmm. Ok, instead of treating the cache as ram, we will have to treat each CCD as a node, and treat the chip as a cluster. It will be hard, but you can fit quite a bit in 64MB.

hedora · 2024-10-12T15:33:29.000000Z

I wonder if you can boot it without populating any DRAM sockets.

lewurm · 2024-10-12T17:50:36.000000Z

Firmware is using cache as RAM (e.g. https://www.coreboot.org/images/6/6c/LBCar.pdf) to do early init, like DRAM training. I guess later things in the boot chain rely on DRAM being set up probably though.

bee_rider · 2024-10-12T15:50:25.000000Z

I would be pretty curious about such a system. Or, maybe more practically, it might be interesting to have a system pretends the L3 cache is ram, and the ram is the hard drive (in particular, ram could disguise itself as the swap partition, to so the OS would treat is as basically a chunk of ram that it would rather not use).

compressedgas · 2024-10-12T16:01:57.000000Z

Philip Machanick's RAMpage! (ca. 2000)

> The RAMpage memory hierarchy is an alternative to a conventional cache-based hierarchy, in which the lowest-level cache is managed as a paged memory, and DRAM becomes a paging device.

afr0ck · 2024-10-12T22:30:26.000000Z

So, essentially, you're just doing cache eviction in software. That's obviously a lot of overhead, but at least it gives you eviction control. However, there is very little to do when it comes to cache eviction. The algorithms are all well known and there is little innovation in that space. So baking that into the hardware is always better, for now.

edward28 · 2024-10-12T22:56:47.000000Z

Intel has such a CPU with the previous gen called the xeon AMX with up to 64gb of HBM on chip. It could use it a cache or just memory.

dmitrygr · 2024-10-14T19:06:55.000000Z

That would require either rewriting drivers to never use DMA or making sure that all DMA controllers are able to write into and read from L3 directly.