Hacker News new | past | comments | ask | show | jobs | submit login

Forgive me if this should be obvious, but why would a simple read from a pointer take so many cycles?



A slightly more elaborate answer than the sibling post to drive home how much happens on a simple read that is not cached :

- request to L1D cache, misses

- request to L2D cache, misses

- packet is dropped on the mesh network to access L3D, likely misses

- L3D requests load from memory from the memory controller, load is put in queue

- dram access latency ~100-150

- above chain in reverse

This is the best case scenario on miss, because there could be a DTLB miss on the address (which is why huge tables are crucial in the paper) or there could be dirty cache lines somewhere in other cores that trigger the coherency mechanism.


because you have to fetch it from RAM, unless the problem is small enough to fit in cache


Ah, right. I don't know how I forgot about cache and RAM.


> because you have to fetch it from RAM, unless the problem is small enough to fit in cache

... might have to...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: