My main introduction to caches, cache sharing, etc. were from
UNIX(R) Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers
Fantastic, I just had one of those AHA moments. This finally made me get why we use set-associative caches: the initial lookup by set index is the key. Somehow other written material and various lecturers failed to communicate that to me so clearly. Diagrams FTW.
Also, this is a pretty good read about memory and cache stuff: http://people.redhat.com/drepper/cpumemory.pdf