Hacker News new | past | comments | ask | show | jobs | submit login

It does, though. See the diagram in the article. There's two allocated arrays there: one for the real hash table (on top) and one for the "backing array" which then stores the actual items. The "extra" bit is the indexes in the top allocation: those indexes would be omitted in a normal (unsorted) hash map. (That is, a normal hash map allocates a single array, which is the storage for both the hash table and the items it contains.¹)

(There's also some extra time requirements, too, to chase that additional pointer from the top table to the bottom one.)

¹N.b. in some languages those items will be references, but that doesn't matter for the comparison here.




Note that the bottom array can be densely packed, whereas the top one must have gaps to avoid excessive probing. If you only had one array you would have to put the gaps in that array, and it is at least 2 words per entry (much worse if you are inlining objects). So it's not so clear. If you want to save space you can also omit the hash bits stored in the index and reduce the width of the index entries to 16 or 8 bits on small maps.

In practice the compact dict does very well on space. Much better than Java's unordered bucket-based HashMap for example.

It's true that there's an extra pointer chase.


Ah, that's an excellent point, and it does muddle the water about which would win out.

Deletes could still leave holes in the storage array, but I think I agree (and your article also states it) that the common case is likely fill table with data, and then perform lookups.

(And I had thought bucket-based hashmaps had very much fallen out of favor to the point where they really weren't the choice of standard libraries anymore, precisely because of the additional allocations & pointer chasing they required? I'm not a Java person.)


C++ unordered_map unfortunately forces a bucket-based implementation. I don't know about Java.


The existence of the Map.Entry objects in Java implies a bucket based approach, but I don't think that's how it works these days, except perhaps for LinkedHashMap.

This stack overflow answer implies the Map.Entry objects are reused during iteration which means they don't need to exist the rest of the time and a flat KVKVKV backing array could be used. https://stackoverflow.com/questions/5455824/doesnt-that-iter...

Alternatively, new Entry objects are created for each step of the iteration, then heavy inlining could enable the JIT to eliminate that allocation again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: