Forgive me if this should be obvious, but why would a simple read from a pointer...

yvdriess · on March 6, 2020

A slightly more elaborate answer than the sibling post to drive home how much happens on a simple read that is not cached :

- request to L1D cache, misses

- request to L2D cache, misses

- packet is dropped on the mesh network to access L3D, likely misses

- L3D requests load from memory from the memory controller, load is put in queue

- dram access latency ~100-150

- above chain in reverse

This is the best case scenario on miss, because there could be a DTLB miss on the address (which is why huge tables are crucial in the paper) or there could be dirty cache lines somewhere in other cores that trigger the coherency mechanism.

erikmolin · on March 6, 2020

because you have to fetch it from RAM, unless the problem is small enough to fit in cache

johnlorentzson · on March 6, 2020

Ah, right. I don't know how I forgot about cache and RAM.

signa11 · on March 6, 2020

> because you have to fetch it from RAM, unless the problem is small enough to fit in cache

... might have to...