Hi. Gustavo Pezzi here (from pikuma.com). You're absolutely right; I'll make sure to pass this to Gabriel and I'm sure we'll add this detail to the blog post soon. Thank you for the heads up.
Lots of fun stuff here. Last I checked, Intel's prefetching only works in the positive direction! A reason to have loops move forward. Also, number of simultaneous prefetching streams depends on architecture.