Hacker News new | past | comments | ask | show | jobs | submit login

Cache misses are independent of language; however C++ gives you a lot of control over where items are placed in memory.

To create a cache miss in one processor all you need to force the processor to look at M different memory addresses that map to the same cache location, where M is higher than the N-way associativity for your processor. The specifics of this differ for each processor, but as an example, Intel Sandy Bridge CPUs have 8-way associativity, so if you fetch 9 addresses that are 4096 bytes apart you will cause a cache miss.

Lots of small tips and links if you Google, e.g.

http://stackoverflow.com/questions/8744088/what-is-the-best-...

http://stackoverflow.com/questions/3359524/detecting-cache-m...

The full detail is in books like http://www.intel.co.uk/content/www/uk/en/architecture-and-te...




Another interesting case: suppose you have a contiguous array of objects that are 48 bytes big, and your cache line is 64 bytes. Then every second object will straddle two cache lines. For random access to any one such object, padding them out to 64 bytes each will run faster on average, as it will access main memory only once and evict fewer cache lines. But you can only fit 3/4 as many objects in your cache now, so if you need to access a lot of them then your program will run slower.

So, "profile, profile, profile", but also understand factors like this that pull in different directions in different scenarios.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: