Hacker News new | past | comments | ask | show | jobs | submit login

I've always wondered: why do CPU caches only key off the low-order bits of the memory address? e.g., DDR controllers generally key off a linear hash of most of the address.

Is gate count really that tight in L1, that they can't throw a few XOR taps in front of the cache bus? Or is it simply to make cache collisions more predictable?




Gate count is not the issue - the issue is L1 timing. For clocks at 2 GHz plus, there can be very few stages of logic in between flops. Just decoding an address to a one hot wire (necessary to access the memory bank) takes up a good chunk of that time interval. Reading the bits out and resolving them to back to full rail also takes time. Checking for a tag match, more time. Routing that back to the register file, yet more time. If it's not the L1 cache, say shared L3, the total time of flight across the chip is multiple clock cycles, not to mention all the aforementioned time penalties in addition to the longer access latency into even larger shared L3 cache.

Relatedly, in the L3, there is a hash function used to distribute different address to different regions of the L3. The cost of doing this is less significant in two ways: the L3 access latency is already much much higher (as elaborated on above) and the hash calculation can be done in parallel with other required logic (e.g. in parallel with L2 access).


Thanks, that was a great answer.


The addressing is designed to take advantage of spatial locality - adjacent addresses will be adjacent in the cache too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: