Hacker News new | past | comments | ask | show | jobs | submit login

Well, depends on the size of the hash table and the particular memory access patterns. Lookups into many small hash tables, or workloads in which most threads in a threadgroup all fetch the same entry, can be very efficient on GPUs. Sparse virtual texturing is often implemented with a hash table and works well on GPUs because the hash table involved has both of these properties.

(I'm sure you know this, just wanted to clarify.)




Yes, a very good point. I am assuming the tables are quite large due to the workload. If it's large enough to give a benefit to large pages in reduced DTLB misses, it's likely too large for warp-local memory :)


"We pre-allocate 1000 2MB Hugepages and 10 1GB Hugepages which are found to be enough for both Delicious-200K and Amazon-670K datasets."

So ~12GB total for those datasets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: