Hacker News new | past | comments | ask | show | jobs | submit login

The most important thing is that there is some defined order. This makes it easier to debug, and reproducibility is just a very nice property. See https://en.wikipedia.org/wiki/Reproducible_builds Simple stuff like making golden output files for your tests becomes easier.

Certainly, having an ordering on the keys themselves (eg. alphabetical order) is an alternative to insertion order. I'm fine with that. But eg. the built-in maps in Go, Perl or JS don't support that.

So often you end up getting the keys, and then immediately sorting them before iterating. That's taking something that should be O(n) and making it O(n log n). I remember doing this all the time in Perl.

And often there's no order, so you have to create one for your key type. That's extra work, and it's just easier to use a data structure that doesn't require it.

I would be interested to see performance comparisons for maps where the keys are ordered without having to sort on each iteration. Often they have tree structures internally, which means cache-unfriendly pointer chasing, but probably you can remove most of this (and the log-n access time) with wider trees like B-trees.




For perl yoy can get this but you have to explicitly ask for it outside the perl interpreter. This is dome by setting up the hash seed and tell it not to add turbulence during runtime.

export PERL_HASH_SEED=0; export PERL_PERTURB_KEYS=0;

The article doesn't mention why python and perl made the choice to add more randomness but ti's because there were attacks out there where people could determine which bucket keys would be put in ahead of time and calculate a set of keys to cause pathological behavior in anything that made a hash from user input, this was commonly url parameters or POST requests. This meant that a simple web request with the right information in it could cause your application to continually reallocate memory and expand the hash size to accomadate the inputs. Usually this just burned cpu time and made other requests time out but it could also result in excessive memory usage and the OOM killer ccoming into play.

The above environment variables work to set a known seed and then not mix any new randomness in, this doesn't get you an jnsertion order but it will make the hash itself as reproducible as possible (if you don't put the keys in in the same order your results will vary still). I uwe this in a now defunct (because of second/third/fourth system syndrome) fuzzer I had going to detect problems in the prerelease versions of perl.


I don't think compact dictionaries are necessarily vulnerable to this issue. You could randomize the hash seed without affecting the iteration order.


It definitely depends on the implementation and I do imagine you're right you could make it secure pretty easily without affecting it. But switching from non-compact, non-guaranteed would have resulted in API changes internal to perl that would have been far more difficult to do. This is because of some design decisions in the past that result in perl having to be rather conservative about internal interpreter changes.

Perl uses a system called XS to write C code that interacts with the perl interpreter. XS doesn't have a defined API other than "what the perl internals do", this is because it's not really a C api but in fact a way to extend the interpreter itself (you can replace parts of the parser and do some other shenanigans) and it exposes basically all of the fields, structures, and functions that are used inside the interpreter to all C extensions to the language. There are some nice modern alternatives to doing things this way, but the end result is that unless we want to break just about every library on CPAN that uses any native code some care has to be applied to not change too much of the API at any given time. That's also why there's Devel::PPPort that tries to wrap both new and old APIs/structures in a way to provide some amount of compatibility between perl versions where there is otherwise no compatibility.


> The most important thing is that there is some defined order. This makes it easier to debug, and reproducibility is just a very nice property. See https://en.wikipedia.org/wiki/Reproducible_builds Simple stuff like making golden output files for your tests becomes easier.

Relying on this type of behavior when it isn't required is somewhat of an anti-pattern. And can often lead to mindlessly copying the results of an implementation in tests.

A quick aside: any sort of generic ordered container that supports O(n) iteration must necessarily require O(n log n) work to construct in the average case. So the asymptotic complexity is necessarily the same as sorting the keys of a hash map with undefined iteration order to artificially create a defined order.

It isn't too hard to simulate the behavior of insertion order iteration by maintaining an array alongside the hash map. And surprisingly enough, space and runtime complexity is practically identical to containers that implement insertion order iteration ;-)


> any sort of generic ordered container that supports O(n) iteration must necessarily require O(n log n) work to construct in the average case

This doesn't apply if the order you want is insertion order.

These hash maps are constructed in O(n) on average, just like any other hash maps.

Java's LinkedHashMap is also constructed in O(n).


"ordered container" meaning that the container is ordered based on the objects it contains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: