Hacker News new | past | comments | ask | show | jobs | submit login

Be nice to see a hash map here.



Due to popular demand, I will implement unordered_set after the holidays, once the new year is in full swing.


I recommend looking at the khash implementation https://github.com/attractivechaos/klib/blob/master/khash.h It's very competitive to optimized implementations like abseil and much, much better than the C++ STL one.


In the Java stdlib, Set is implemented on top of HashMap. I don't see how the opposite is possible?


Yes, that part of the readme:

> STL std::map will not be implemented in CTL because maps only provide slight syntactic improvements over sets.

makes no sense to me.

It's the other way around: a set is a map with a irrelevant / ignored value (or singleton / ZST if available), with built-in functions for set-oriented operations (union/difference/subset/superset/disjoint). So if you're only going to provide one of them, you provide a map (that's what Go does).

You don't have to implement sets in terms of maps though, it can be useful to provide different implementations of them as their usage patterns can be different and thus it might make sense to use a different underlying data structure (not sure about trees there but definitely a possible consideration for hashes).

It also makes sense to have a bespoke set implementation if the language doesn't have ZSTs, to avoid wasting space (between 1 byte and 1 word per key depending on the storage details).

Finally it can make sense to have a bespoke set implementation due to language visibility rules, so you can e.g. hit the hashes directly (rather than have to recompute them) when performing operations involving multiple sets.


It's straightforward to use a set type as a mapping type, so long as it provides three things:

* Comparisons don't have to use all members of the type e.g. if I have two Employee objects then I'm allowed to create a comparison function that only compares their employee IDs and ignores the name fields (perhaps because the name ought to be derived from employee ID). (I know this article is not about C++, but as a reference point this has always been possible in C++'s STL.)

* Comparisons don't need a full-blown instance of the contained type e.g. in our employee example, the set can make use of a comparison function between int and Employee without having to construct an Employee with that ID. (This has been possible in C++ since C++11, and is often called heterogeneous lookup.)

* The last point is a bit fiddly: Either the set type allows mutation of the objects so long as the parts relevant to comparison don't change, or you only wanted to store const values anyway. I mention this because in C++ if you have a std::set<Foo> then you can't mutate the Foo elements at all, even if it has no effect on the comparison of the elements with other elements. (Unless there are const methods that actually mutate it... yuck!)

So long as you have those three things, you can have a set of (key, value) pairs where the comparitor only compares keys, and bingo you have a mapping type.


This is true, but your second point is difficult to do in C. You would have to have a set template and a separate lookup template to search the set based on a key type. This is actually how heterogeneous lookup is implemented in C++: it's a template within a template. This is not a big deal in C++ because templates are instantiated automatically but we're instantiating templates manually in C. It's also only available as of C++14 and only on ordered containers, not hash tables.

It's more straightforward to combine these templates: make map the fundamental type, but make values contain their keys and make the key type default to the value type if omitted. This way it's both a map and a set. My own C hash table template does it this way [1]: it stores only a value type and not keys, but there is an optional separate key type used for lookups, hashes and comparisons. To make it a map, you provide both a key comparison expression and an expression to convert values to keys.

[1]: https://github.com/ludocode/pottery/tree/master/include/pott...


There's a 4th requirement: that the set has some way to return colliding entries on insertion (or provides access to the entry itself), which most don't. Otherwise lookup requires iterating the entire set.

> It's straightforward

It's really not. Straightforward is the other way around, this is convoluted and involved. It makes the library much harder to use for rather limited gains.

> Either the set type allows mutation of the objects so long as the parts relevant to comparison don't change

This sounds like a footgun requirement at best, I'm pleasantly surprised (for once) that C++ doesn't just let you mutate the entire thing at will.


To be honest, that doesn't sound so straightforward to me. Going the other way and implementing Set on top of HashMap is comparably trivial, making those kind of API acrobatics unnecessary.


a map is a set indexed only on a subset of the fields of your type.

This maps directly to the relational model, where a table (or relation) being a set of tuples is the fundamental concept: for your typical programming language set the uniqueness constraint (and key), would be comprised by all the columns, while for a map it would only be comprised by a subset (just one at the limit), while the remaining columns would be the mapped type.

Set operations (union, intersection, difference) do make sense for maps as well in fact.

Given a powerful enough language, you can generalize your sets and maps with multiple uniqueness constraints and secondary indices. See boost.multindex for example.

edit: and BTW, most STL implementations use the same underlying implementation for std::set and std::map which has a set-like interface with the ability to specify the key mapping function.


Sets (and maps) are implemented as red black trees in the STL. The STL unordered_set (and unordered_map) is what I believe Java would call a HashMap, unless I am mistaken (I am not a Java guru).


I don't know that the STL requires rbtrees for map and set, but you're correct that `std::unordered_map` is the STL's hashmap. The Java version of std::map would be java.util.TreeMap (and std::set => java.util.TreeSet).


The STL's complexity requirements pretty much dictate a balanced binary search tree as the underlying representation for map and set. The choice then is essentially between red-black trees and AVL trees.


A map is a set of key,value pairs, and the set lookup function only considers the key in the comparison.


Remember that C++'s unordered_set is quite slow (e.g. due to the use of lists for hash buckets.)


I will keep this in mind when I implement unordered_set. Thank you, and happy holidays


hooray for hashes!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: