Hacker News new | past | comments | ask | show | jobs | submit login

But Zen 2 has HUGE L3 cache. I think that this cache would compensate for that latency.



Actually, the L3 cache is also sharded across chiplets, so there's a small (~8MB) local portion of L3 that is fast, while remote slices will have to go over AMD's interdie connection fabric and incur a serious latency penalty. On first gen Epyc/Threadripper, nonlocal L3 hits were almost as slow as DRAM at ~100ns (!).


Does that local vs remote L3 show up in the NUMA information?


Only partially. There are diminishing returns on cache size.


Only insofar as there are diminishing returns on increasing memory, in general. If you can map your entire application instructions in a low latency block of memory, you're going to see massive benefits over swapping in/out portions repeatedly (where RAM latencies come into play).


Memory access typically follows a pareto distribution with a long tail. So doubling the size of the cache increases access speed more towards the tail, so the speedup is always less than the speedup of the previous cache size increase. The actual effect will vary by application but if the data doesn't all fit in cache, and access patterns follow that long tail distribution, its true that increasing the cache size had diminishing returns. Which is the case for almost all applications.


Sure, but that applies to main memory as well. Ergo, having a larger cache will offer a benefit over memory correlatively; it's only diminishing relative to itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: