Look at 6.3.2 for software prefetching. Like the rest of the paper it's very x86 centric but most of the concepts apply to other architectures.