Hacker News new | past | comments | ask | show | jobs | submit login

Rust has hash maps that hate you.

The default hash map visits the keys in arbitrary order (https://doc.rust-lang.org/std/collections/struct.HashMap.htm...).

You could use a BTreeMap, but that's not insertion order and requires the user to come up with some sort of ordering.




They don't hate you. That's how it's supposed to work. Otherwise you get issues like the one discovered in the article.

Would you say Rust has Vecs that hate you because prepend isn't O(1)? Or does JavaScript have Maps that hate you because they aren't sorted?

There's no perfect container. It's all trade-offs.


The article also explains how the issue discovered can be fixed.

And JS maps are insertion ordered.


The article doesn't expand much on the performance cost. Another comment mentions Swiss tables, which are what rust HashMap is based on these days, and it's unlikely that python's maps are as optimised. Not to mention the unavoidable memory overhead.


Which unavoidable memory overhead?

The hash map as described in the article doesn't seem to waste any memory.


The indices. The article stores Hash+Index+Key+Value, compared to storing Hash+Key+Value.


It's not that simple.

The index table needs to be sparse (for the hashing to work), and thus has quite a few entries unused. It also wants to be small in memory to improve cache locality.

Therefore you want to have the entries in the index-table as small as possible.

As example, let's say you have n entries in your map which is 2/3 full. Let's assume the key and the value each take 1 word.

In the index+key+value case you would thus use: n1.5 words for the indexes, and n2 words for the keys/values. -> 3.5 words per stored key/value.

In the key+value case you would need (2n)1.5 words. -> 3 words per stored key/value.

However, that assumes that the index-table uses a full word per entry. That's usually not the case, as you can use smaller types while the map is small enough. On 64-bit machines, the index-table will rarely use more than 32-bit arrays. In that case you would only need 2.75 words per stored key/value.

That's not really fair, though, as the value/key table has some unused space, so that it doesn't reallocate/copy every time a new key/value is added. Implementations can play with how much extra space is allocated for those unused slots.

Also, the "hash" you mentioned can be combined with the redirecting index. For Toit (the map of the blog post), the index table uses 32 bits. In these 32 bits it stores both (part of) the hash code, as well as the redirecting index.

It might be the case that the hash+key+value approach uses less memory, but I would not be confident to say that it's always the case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: