It does, though. See the diagram in the article. There's two allocated arrays th...

ErikCorry · on July 1, 2021

Note that the bottom array can be densely packed, whereas the top one must have gaps to avoid excessive probing. If you only had one array you would have to put the gaps in that array, and it is at least 2 words per entry (much worse if you are inlining objects). So it's not so clear. If you want to save space you can also omit the hash bits stored in the index and reduce the width of the index entries to 16 or 8 bits on small maps.

In practice the compact dict does very well on space. Much better than Java's unordered bucket-based HashMap for example.

It's true that there's an extra pointer chase.

deathanatos · on July 1, 2021

Ah, that's an excellent point, and it does muddle the water about which would win out.

Deletes could still leave holes in the storage array, but I think I agree (and your article also states it) that the common case is likely fill table with data, and then perform lookups.

(And I had thought bucket-based hashmaps had very much fallen out of favor to the point where they really weren't the choice of standard libraries anymore, precisely because of the additional allocations & pointer chasing they required? I'm not a Java person.)

pclmulqdq · on July 2, 2021

C++ unordered_map unfortunately forces a bucket-based implementation. I don't know about Java.

ErikCorry · on July 2, 2021

The existence of the Map.Entry objects in Java implies a bucket based approach, but I don't think that's how it works these days, except perhaps for LinkedHashMap.

This stack overflow answer implies the Map.Entry objects are reused during iteration which means they don't need to exist the rest of the time and a flat KVKVKV backing array could be used. https://stackoverflow.com/questions/5455824/doesnt-that-iter...

Alternatively, new Entry objects are created for each step of the iteration, then heavy inlining could enable the JIT to eliminate that allocation again.