Same way you find it when doing a lookup later? I know that's probably a naive a...

yxhuvud · 2025-02-11T10:18:35 1739269115

So using buckets with linked lists like that is neither space efficient or fast. A strategy that nowadays is more common and fast is to store conflicts in the table itself using some strategy to find a place in the table itself to put the new entry. The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.

This means a linear scan that once the table gets close to being full will approach O(n). To avoid this, better strategies for choosing the next place to look is used, as well as automatic resizing of the hash table at some occupancy percentage to keep the lookup chains short. Other strategies in use will also approach O(n) but will require resizing on difference occupancy percentage. What is new in this approach is that they manage to go faster than O(n) even at almost full occupancy.

xxs · 2025-02-11T16:01:12 1739289672

>The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.

The linear probe is by far the most efficient way to build a hashtable on any modern hardware, nothing is near close. Everything else leads to cache trashing on misses. For the nearly full table - that's a mistake - table should not go above a specific fill factor, e.g. the notorious 75% for large tables.

yxhuvud · 2025-02-11T18:17:19 1739297839

The problem with the linear probe is that it creates long runs of collisions, thereby forcing you to avoid that by having a lower fill factor.

xxs · 2025-02-11T18:49:47 1739299787

>that it creates long runs of collisions

Yes, of course. In practice it still outperforms pretty much anything else. The lower fill factor is still cheaper (memory footprint) than having buckets and indirection.