Sparse matrix math basically boils down to indirect array references: A[B[i]]. G...

l33tman · on March 6, 2020

Yeah on the GPU you need to get your threads to ideally load consecutive memory locations for each thread to utilize the memory bandwidth properly. Random-indexing blows this out of the water. I guess that you could pre-process on the CPU though to pack the sparse stuff for better GPU efficiency..

vchak1 · on March 6, 2020

You can solve around this by using cuckoo or robin hood hashing. See for example: https://www.researchgate.net/scientific-contributions/148064...