I would make the point that C++ unordered_map is not remotely optimized for this...

nly · on June 7, 2015

Indeed. std::unordered_* implementations are hogtied by the interface and complexity requirements in the C++ standard (it basically mandates chained buckets), and more generally the need to work with almost any hash function (linear probing would suck with crappy hash functions for example).

Jo Muñoz wrote a cool blog series on the current implementations in circulation, the flaws with the interface, and how his implementation (Boost multi_index) made further optimisations despite the constraints.

Don't miss the other two parts of the series - look under October on the right. He did extensive bookmarks in November as well.

http://bannalia.blogspot.co.uk/2013/10/implementation-of-c-u...

StephanTLavavej · on June 7, 2015

I bookmarked those posts a while ago in the hopes that I'll get a chance to rewrite VC's unordered_map in the future. We've taken the first step towards doing so by deprecating the non-Standard hash_map (which shares its guts with the Standard unordered_map) in 2015, so I can remove it outright at the beginning of the next development cycle. Then the unordered family can be changed without fear of breaking the hash family.

nly · on June 7, 2015

Thanks for what you do STL (For those that don't know, Stephan maintains Microsofts C++ standard library). Even though I don't develop against MSVC at all, improving the state of C++ at Microsoft makes life easier for all of us in terms of portability and bringing nice new things in to common use.

Thank you also for maintaining your GCC distro for Windows[0] and the the many great presentations you have given at CppCon, Channel 9, Going Native etc.

[0] http://nuwen.net/mingw.html

StephanTLavavej · on June 7, 2015

You're welcome! And even if you don't use VC's STL, hopefully make_unique and the other things I've designed will be helpful.

nly · on June 7, 2015

And for making this possible

    vector<int> v = {1,2,3,4,5}; 
    cout << accumulate(begin(v), end(v), 1, multiplies<>());

jaytaylor · on June 8, 2015

I'm curious..

Is it possible to use VIsual C++'s STL inside of Linux?

StephanTLavavej · on June 8, 2015

Nope, our sources assume they're running on Windows.

joaquintides · on June 8, 2015

You might also consult the entries given below, which take into account some late improvements in Boost.MultiIndex not present at the beginning of the series:

http://bannalia.blogspot.com/2014/01/a-better-hash-table.htm... http://bannalia.blogspot.com/2014/01/a-better-hash-table-gcc... http://bannalia.blogspot.com/2014/01/a-better-hash-table-cla...

gsg · on June 8, 2015

That's an admirable sentiment, but aren't your hands tied by the promises made by the standard? In particular the addresses of elements are guaranteed to be stable, which would seem to prevent an implementor from relocating them to achieve more cache-friendly layout.

(Of course, you would want to do as well as possible within the given bounds.)

StephanTLavavej · on June 8, 2015

Yeah, unordered_meow is always going to be a pretty crummy hash table. But our implementation could be significantly improved while still obeying the Standard.

gsg · on June 8, 2015

Fair enough! Good luck with your rewrite.

aaggarwal · on June 7, 2015

While Graph Engine definitely looks useful, I agree that the comparison is unfair. GE is an optimized implementation while associative containers (unordered_*) in C++ are more generalized implementations. The above blog provided clearly explains optimized implementations of associative containers in C++. It would be interesting to see how these compare with GE.

v-yadli · on June 7, 2015

Please note that this test also only captures a single profile of Graph Engine. The cell entry lookup table and the MM module are designed to operate in a more complex environment and not optimized towards the direction of this micro benchmark(concurrency, logging, a strong type system, to name a few). Of course, it would be very interesting to test against other options, especially those concurrent dictionaries combined with custom allocators.