Hacker News new | past | comments | ask | show | jobs | submit login

Something I don't understand is how to deal with cache coherency when you need the same data in a bunch of different configurations.

Take a typical game loop and assume we have a list of Transforms (e.g. world matrix, translation/rotation/scale, whatever - each Transform is a collection of floats in contiguous memory)

Different systems that run in that loop need those transforms in different orders. Rendering may want to organize it by material (to avoid shader switching), AI may want to organize it by type of state machine, Collision by geometric proximity, Culling for physics and lighting might be different, and the list goes on.

Naive answer is "just duplicate the transforms when they are updated" but that introduces its own complexity and comes at its own cost of fetching data from the cache.

I guess what I'm getting at is:

1) I would love to learn more about how this problem is tackled by robust game engines (I guess games too - but games have more specific knowledge than engines and can have unique code to handle it)

2) Does it all just come out in the wash at the end of the day? Not saying just throw it all on the heap and don't care... but, maybe say optimizing for one path, like "updating world transforms for rendering", is worth it and then whatever the cost is to jump around elsewhere doesn't really matter?

Sorry if my question is a bit vague... any insight is appreciated




Assuming that once determined at the start of the frame (e.g. camera position changes after user input handling), the transform matrices are not written to. They can then be freely shared across multiple cores without causing problems with coherency. The cache lines associated with the transform will be set to 'Shared' across all cores. Cache coherency will start to bite you in the ass in this situation if you start mutating the transforms while other threads are reading it, causing cache invalidations and pipeline flushes across all caches owning those lines.

In short, write a transform once and treat it as immutable. Do not reuse the Transform allocation for a good while for subsequent frames to ensure that its cache lines are no longer in cache. If you do need to reuse right away, you can force invalidate cache lines by addresses, so that the single-writer in the next step is the single (O)wner and no other caches need to invalidate anything.


Thanks - I'll have to do a bit more learning to really understand this, e.g. how mutability relates to cache lines and what "Shared" means in that context, but this gives me some good practical direction and insight to take it further :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: