Hacker News new | past | comments | ask | show | jobs | submit login

The most egregious offenders here IMO are C++'s map and unordered_map. The spec constrains the possible implementations, and it's hard to implement them as anything better than a red-black tree or a badly-unoptimized hash table, respectively. And really the unordered one should be the default with the nice short name.

It looks like Rust is doing well on this. HashMap (the one everybody uses) is essentially a clone of Google's heavily-optimized C++ hash table:

https://abseil.io/blog/20180927-swisstables

... and if you need the collection to be sorted, their BTreeMap has much lower overhead than the C++ std::map, plus a big prominent note in its documentation saying that HashMap is usually the one you want.

I'm impressed. (Haven't looked at Zig yet.)




> It looks like Rust is doing well on this. HashMap (the one everybody uses) is essentially a clone of Google's heavily-optimized C++ hash table

It didn't start out this way, but their spec was not so constrained as C++, so they were able to switch to this implementation in a backward-compatible way. In particular, Rust's std::collections::HashMap doesn't guarantee that keys and values have stable addresses across mutations of unrelated entries, so they can use open addressing rather than chaining. (And they did use open addressing from the beginning; the SwissTables-like "hashbrown" implementation was just a refinement of that.)

I think a lot of this is just that newer languages can learn from the mistakes of previous languages. It will be interesting to see how Rust (and other newer languages) are able to evolve when their standard library is 20+ years old and some parts don't age so well. (Although actually C++'s std::unordered_map only goes back to C++11 and still sucks...)

One part of Rust's answer I think is to keep the standard library small. Less there, less to screw up. Make it easy to pull in crates instead. Crates can supply the same functionality but can bump their major version relatively easily if the interface has to change. There are advantages and disadvantages to this approach...


The default hash map probably doesn't need the iterator invalidation constraints, but they are quite useful in certain cases.


> anything better than a red-black tree

Is there an equivalent data structure that has better worst-case time complexity than red-black tree?


There are plenty of algorithms that are also O(logn) that are much faster in practice, because of how caches work.


Could you name some please? I'm genuinely interested in learning of data structures with this property.


The simplest one is a B tree. It's like a self-balancing binary tree, except that you put more than two children in each node.

Same big-O, but faster by a big linear multiplier.


If the data is small enough (up to about n=100 magnitude), linear scan O(n) often beats asymptotically faster algorithms.


What, C++'s spec constrains implementation of standard datastructures? This language has hit a new low for me.


Well it defines the big-O of the datastructure APIs. And to match all the requirements, the implementation is usually very constrained.


My impression is that it's typically the iterator/reference stability requirements that lock imlementations down, not the complexity requirements.


Oh yeah, that too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: