Hacker News new | past | comments | ask | show | jobs | submit login

Read the linked article, and the paper linked from there. Basically the idea is that gather/scatter can be very inefficient from a cache and BW perspective. In the worst case you're using only a single element per cache line. So the idea is to "move" the scatter/gather engine to the memory controller, and pack the vectors already in the cache rather than in the register file.

Will it work in reality? No idea, but it's an interesting idea certainly worth exploring.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: