After quickly glancing the code, I concluded that they wrote C++ like there is n...

gpderetta · on April 23, 2020

The use of 'any' and the ridiculous levels of boxing is bizarre.

There is also absolutely no attempt at using std::move; the amount of gratuitous copies and refcount updates is staggering.

Having said that, apart from the use of std::any, the code is fairly clean even though obviously is not even remotely written for performance..

ezoe · on April 23, 2020

It's actually more surprising that the person who can write this otherwise clean and modern C++17 code produce this awful abomination.

The only possible explanation is, he literally port the existing code written in a very dynamic language by hand.

He is not stupid, far from it. It's just he also ported the dynamic part of the code so well.

beached_whale · on April 24, 2020

A copy in function arguments often is not expressed by the compiler and often allows more optimizations.

gpderetta · on April 24, 2020

For relatively small objects with no internall allocation, definitely yes.

Otherwise only a qualified yes: if you are copying it anyway in the body, pass by value allow moving object into and actually avoid copies, but the code linked doesn't use move at all anywhere.

As they seem fairly versed in C++, the only reason for doing that is that they want their code to be modificable by non-programmers, so they want to use a style that is safe and avoids possible use after move.

For this scenario, then yes, C++ is going to be slower than a language with a proper garbage collector (although I would like to see a version with a proper shared_string implementation instead of the abomination that is shared_ptr<std::string>).

beached_whale · on April 28, 2020

Def a qualified yes. The pass by value will often allow the compiler to deal with less as a pass by ref/ptr makes it more pessimistic. You see this play out in trying to get the auto-vectorizer going by copying N values to a local buffer in the loop prior to operating in them.

The optimizer in c++ compilers are really good at inlining and if it's a single TU more so. So the size isn't the issue in those cases too. In addition, on the other size most optimizers have a form of return value optimization that elides the copy/move on the return value.

Where they often get hit up or lacking is heap allocations. They seem to hide a lot and only some compilers can elide a heap allocation.

Use after move is interesting. Many cases of move are not necessary anyways (returning an automatic storage object) but the complexity of the code already would be on part with using std::move. I don't think an optimizer will do that one even if it is that last usage.

WesternStar · on April 23, 2020

<unpopular opinion>We should break ABI on unordered map just to stop embarrassing ourselves in public and in front of new users.</unpopular opinion>

alexhutcheson · on April 24, 2020

It’s unfortunately not just ABI, but also API. The standard specifies that you can get iterations to specific buckets in O(1)[1], and also specifies bucket_count(), max_bucket_count(), bucket_size() (which is specified to be O(n)), and bucket(). Those functions and their specified performance make it effectively impossible to implement a standards-compliant std::unordered_map without using separate chaining.

[1] https://en.cppreference.com/w/cpp/container/unordered_map/be...

SamReidHughes · on April 24, 2020

You can just remove those functions. The real problem is that the API break would invalidate iterators and create undefined behavior.

UncleMeat · on April 24, 2020

I'm not even sure that is an unpopular opinion. Its a minority opinion, but people on the committee have been grumbling about how ABI consistency is hamstringing C++ for a while now.

RantyDave · on April 23, 2020

What's wrong with unordered map?

ezoe · on April 23, 2020

It's implementation is buckets divided node based hash map. Theoretically, its order is good, but not that efficient in modern hardware where the memory access is heavily cached so the data locality is more important than saving the memory copy or memory size. Using the linked list to save some memory copy doesn't benefit at all and the overhead is far greater than simply copying the contagious large chunk of memory. Especially so if the data type is trivial so the simple byte by byte memcpy is suffice.

gpderetta · on April 23, 2020

> contagious

normally I wouldn't point out an obvious autocorrect mistake, but this is really season-appropriate (or, better, inappropriate) :).

ezoe · on April 23, 2020

Oh that's... not intended really.

TwoBit · on April 24, 2020

So use unordered_map with an allocator that returns nodes from a small local memory?

josefx · on April 24, 2020

To be useful that would require that the nodes are allocated in the order they appear in the map (and still has pointer overhead). Is that a given with an unordered map or does it break down the moment the map is rehashed?

cozzyd · on April 23, 2020

why not just define std::dict or something like that that is better?

WesternStar · on April 23, 2020

Because you should be able to find a good hash data structure for your language. You shouldn't be able to accidentally get a bad one. It should arguably be the first data structure you reach for.

imtringued · on April 24, 2020

Deprecate the old data structure. Even Java did that. Originally it had the Vector class as primary List type. Nowadays everyone uses the List interface and ArrayLists as default implementation.

You can even find this in the javadoc of the Vector class:

>As of the Java 2 platform v1.2, this class was retrofitted to implement the List interface, making it a member of the Java Collections Framework. Unlike the new collection implementations, Vector is synchronized. If a thread-safe implementation is not needed, it is recommended to use ArrayList in place of Vector.

gpderetta · on April 24, 2020

I think the parent is proposing adding a better hash map to the standard library but with a different name to preserve backward compatibility.

ezoe · on April 23, 2020

The problem is, hash map isn't that effective in modern hardware, the overhead cost is greater than the benefit. If it's consists of clustered network connected computers, it may be, but not for the single local computer for the most of problem.

movedx · on April 24, 2020

> My conclusion, it's slow because they wrote C++ like a dynamic typed language

Here's my conclusion without needing to look at the code: is C++17 so complicated a language that having to understand deeper concepts such as the ones you've mentioned means a reduced "time to market" for marginal gains over the same code written in Go?

Or put another way: did you just further prove that Go is the better option because it's providing extremely similar performance yet doesn't require this very esoteric knowledge you've mentioned?

If I can get the same work done in Go and I can learn Go in a week versus months for C++17 (which is far more complex), why would I pick C++17?

kllrnohj · on April 24, 2020

If you know Go and don't know C++ then yes absolutely use Go. If you're trying to do a performance comparison though it doesn't seem unreasonable to have "basic" familiarity with the languages being compared?

None of the knowledge mentioned is "esoteric". We're not talking hyper optimized insane C++ optimizations here with custom allocators and specialized code generation with templates. We're talking "basic" memory management & standard library data structures.

movedx · on April 24, 2020

> If you're trying to do a performance comparison though it doesn't seem unreasonable to have "basic" familiarity with the languages being compared?

No. Because it's not just a performance comparison, really. Without realising it, it's also a learning test. The very fact they wrote sub-par C++ code, likely without realising it, and then proceeded to produce (sub-par?) Go code that was faster demonstrates the difficulty of one language over the other. It demonstrates the cognitive efforts required between the two options and one of them was easier to produce and the results were more than suitable.

These technology comparisons are important, for sure, but at the end of the day people want an ROI from everything they do, even the things that don't involve money. So if something can be solved in six hours with one solution or 36 hours with another, and the latter only yields a 10% performance, then I'd rather my end goal be ready and "in the market" for 30 hours ahead of the other guy using the slower solution for a 10% speed improvement.

adev_ · on April 24, 2020

> No. Because it's not just a performance comparison, really. Without realising it, it's also a learning test. The very fact they wrote sub-par C++ code, likely without realising it, and then proceeded to produce (sub-par?

Who or what tell you it's a learning test ? It is a pretty weak asumption.

The publication says nothing on the actual past experience of the developers in every language. They might very well be experienced in Go or Java.

From their C++, It is pretty obvious it is not their main language.

In my experience, this kind of code of nested shared_ptr everywhere is pretty typical of developers with a Java background just starting C++.

llm · on April 24, 2020

they clearly want the performance - the algorithms (as you can easily see in the repo) are trivial, its just a huge amount of data to process

C++ can shine here - and still looking nearly like the go or Java-Code (due to the simple algorithm) but some one tried to write highly sophisticated code (there is no tutorial or book about C++ that teaches you to do it that way) with the result that C++ is far more slower then possible - we definitly not talking about 10% gain here

movedx · on April 26, 2020

> they clearly want the performance - the algorithms (as you can easily see in the repo) are trivial, its just a huge amount of data to process

> C++ can shine here

And my point is: it didn't. And it didn't because of the learning curve involved. It's only the better solution if the knowledge is invested in ahead of time, and if the gains don't outweigh those you'd get from Go by an amount that greatly exceeds the time invested to net them, then it's not the better choice.

cppbert · on April 27, 2020

but your point is just not correct - i know go, Java and C++ very well and i don't understand why the C++ code isn't just like the Java code - there is absolutely no need for this deep nested, ugly, container parties that are not in the Java code at all

59nadir · on April 24, 2020

The flip side to this is that for someone who knows roughly what a CPU does (you don't need in-depth knowledge to think about pre-fetching and branch prediction) C++ is going to be easier to use to leverage that. That's really the end of it.

I agree with you that writing C++ is harder than writing Go and that writing naïve C++ leading to disproportional pessimization of your program shows that it's not the right tool for people who lack the knowledge to use it.

At the same time it's really not hard to translate simple concepts into simple code for lower-level languages, that runs well on modern processors. Basic usage of (possibly) growable, homogenous arrays will get you far. Contrary to what most C++ app developers seem to think custom allocators aren't complicated either, and will guarantee that you get much better cache locality and memory management.

Most of these issues are made hard by people, including the premise of learning the basics of what you need to do for the processor to execute your code fast.

movedx · on April 26, 2020

That being said if the time invested in learning C++ and the extra time needed to write it and optimise it doesn't exceed the performance of the same solution written in (optimised) Go by a factor larger than the time invested then it's not the right choice overall.

cppbert · on April 27, 2020

you are just wrong - someone without any good C++ knowlegde except porting the Java port plain to C++, ported it by using every crude feature and wild combination of shared-ptr, container nesting in a way you will nearly not find in any open source or tutorial project - its not about the ease of go, the the benchmark results are just wrong in so many ways

comex · on April 24, 2020

‘Esoteric’ is in the eye of the beholder. Even in Go you gain performance by avoiding boxing.

> If I can get the same work done in Go and I can learn Go in a week versus months for C++17 (which is far more complex), why would I pick C++17?

Because C++17 code if written well is likely to be much faster. If you don’t care about that, by all means stick with Go. But keep in mind that learning is a one-time cost, and you can keep applying the same knowledge to project after project.

movedx · on April 24, 2020

> Even in Go you gain performance by avoiding boxing.

Go has boxing?

> Because C++17 code if written well is likely to be much faster.

I was hoping someone could prove this, actually. I don't have the means or the knowledge.

magicalhippo · on April 24, 2020

> I was hoping someone could prove this, actually.

Maybe this presentation by Jason Turner from CppCon 2016 can illustrate just what can be done in C++17. His target in the presentation is a C64, but I think it illustrates well just what you can do with the language, and hence what tools are available for writing fast code.

https://www.youtube.com/watch?v=zBkNBP00wJE

edit: If you just want a quick fix, check out what a simple keyword does here: https://youtu.be/zBkNBP00wJE?t=1616

icholy · on April 24, 2020

interfaces

llm · on April 24, 2020

that is really not the case here - the Java and C++ implementatios are very plain boring algorithm code (some maps,lists,searching) non of the fancy language features are use to reduce or optimize the code - its absolutely not a C++-is-hard-to-get-right-thing

the C++ code is just using rarely used features in over-wild combinations - as stated: it seems that someone tried to port highly dynamic script code 1:1 to C++ without any knowledge about C++ and that just make no sense, even porting the current Java-Code 1:1 to plain old C++ Code would result in a much better performing version of the algorithm

everything (even the simplest stuff) is FORCED to be heap-allocated (with huge amount of code), in case of deque and shared_ptrs even double or tripple times - without any benefit of doing it

i would bet that a more clean, less every-feature-using version of the C++ algorithm would beat the Java, go code by a huge factor

dataangel · on April 26, 2020

The problem with that idea is that he actually used more advanced features to make it worse (any_cast). This is not how you would do it if you picked up a beginner C++ book and read the first few chapters.

brnt · on April 24, 2020

Welcome to scientific programming :)

benibela · on April 23, 2020

>the unordered_map is very slow(it's a node based hash map, not suitable for the modern hardware)

It is very slow?

I wrote my app in FreePascal. I need a hashmap, but FreePascal has no real standard hashmap, so I have been benchmarking Pascal hashmaps for weeks/months.

Today I added std::unordered_map for comparison. It will still take a day to run the benchmark, but so far it looks that std::unordered_map is 25% faster than the fastest map in the FreePascal standard library. And the best map of 45 Pascal maps is only 30% faster than std::unordered_map. Only 10 maps are faster, and 35 maps are slower.

alexhutcheson · on April 24, 2020

Try absl::flat_hash_map and/or F14FastMap[2].

[1] https://abseil.io/docs/cpp/guides/container#hash-tables

[2] https://github.com/facebook/folly/blob/master/folly/containe...

saagarjha · on April 23, 2020

std::unordered_map is supposedly slow because it has fairly stringent iterator invalidation requirements. In my experience, unless you wring the most performance out of your code as you can, it's not a huge issue; it's generally faster than most casual implementations.

TwoBit · on April 24, 2020

Yeah people here are dumping on it for being non-performance-optimal while ignoring that higher performing versions have restrictions and design limitations. unordered_map is fast enough for probably 98% of uses, and the other 2% may be better off finding a specialized version best for their particular case.

IshKebab · on April 24, 2020

It's not very slow. It's faster than `std::map` and we lived with that for years!

It's just not as fast as some other implementations: https://tessil.github.io/2016/08/29/benchmark-hopscotch-map....

gpderetta · on April 23, 2020

Depends what you do. If you populate it only once and then you do a bunch of lookups, assuming a low load factor, you'll be paying for a modulo operation and an extra indirection, not great but not terrible either. If you do a lot of inserts and removal or your load factor is high, then the performance is going to be not great.

benibela · on April 24, 2020

My benchmark is one insert, multiple lookups, repeat.

I have no use for deletions. Although when I want to make a general benchmark, I probably should measure it once, too

FpUser · on April 24, 2020

Free Pascal actually does have hashmap. Check this for reference: https://wiki.freepascal.org/Data_Structures,_Containers,_Col...

benibela · on April 24, 2020

But that lists like ten classes which map/hash in their name.

None of them stands out, as being THE standard Pascal map to use for everything.

FpUser · on April 24, 2020

There are no silver bullets in the world. If I were you I would take a brief look at the code and it will be clear what's the difference.

IshKebab · on April 24, 2020

Maybe they wrote it really badly so someone will take offence and rewrite it for them!

Also in my experience `std::deque` is almost always slower than `std::vector`. In theory it shouldn't be, but copying contiguous data is so fast on modern CPUs that the reallocation of `std::vector` (or occasional front-insertion) is very cheap.

grovellogic · on April 24, 2020

In theory copying contiguous data is faster than dealing with pointer indirection.

techempower · on April 24, 2020

Hashmaps Benchmarks: https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-05-c...

tom_mellior · on April 23, 2020

> They really like std::deque<T> and use it everywhere even though the sizeof(T) is like a few dozen bytes at best so they should rather use std::vector. The data structure of deque is a list of array. while it can amortize the continious adding of the elements to front or back, since the element size is very small, they should rather use vector.

Are you saying that access patterns (push/pop front/back) never matter and std::vector is always better than std::deque if the element size is very small? I don't think that this is true in this generality.

04091948 · on April 23, 2020

in CppCon 2016: Chandler Carruth “High Performance Code 201: Hybrid Data Structures"[1] he says:

"but the standard deque data type is really quite bad. I wouldn't recommend anyone use it for anything. if you look at the implementation constraints, the constraints placed upon its iterators, and its invalidation constraints, it's painted into a very unpleasant corner and it has very few opportunities to be an efficient data structure."

[1] https://www.youtube.com/watch?v=vElZc6zSIXM

pansa2 · on April 24, 2020

For a language whose purpose is high-performance, the std:: containers are an embarrassment. `deque` is slow, `map` is slow - so the language added `unordered_map` which is also slow...

Are any of the containers (other than `vector`) worth using in high-performance code?

I’m sympathetic to the game developers who I’ve heard say “we use C++, but nothing from std::”.

59nadir · on April 24, 2020

STL is mostly garbage, which is a big reason it's not used in places that actually need performance, both for build times and runtime.

Yes, it's embarrassing. I think it should be noted that the performance mantra is mostly for show with C++, though. The same people who spout it will also blatantly use pessimizing language features just because, when normal, procedural code would've done the job faster and ultimately simpler.

I think the performance overhead of a few things in C++ make sense, but in general you get performance in C++ by turning things off and abstaining from most of the features. Exceptions complicate mostly everything, so the best thing to do is to turn them off and not rely on anything that uses them, for example.

Modern C++ isn't fast and most of C++ wasn't even before the modern variant was invented. The C "subset" is.

adev_ · on April 24, 2020

> STL is mostly garbage, which is a big reason it's not used in places that actually need performance, both for build times and runtime.

It is NOT garbage. It is more than sufficient for the 99% of devs that need key in hand data structure and good enough performance (Meaning faster than 99% of over programming languages already).

If what you need is sub micro-second perf, then yes, redefined your data-structure.

BTW, you will very likely have to do that in any language anyway. Because it is impossible to design fast forever-living data structure. They (almost) all become obsolete when architectures evolve. Red-Black Trees where state of art DS, teached-at-school 10 years ago, they are useless garbage nowadays if you seek for performance.

kouteiheika · on April 25, 2020

> It is more than sufficient for the 99% of devs that need key in hand data structure and good enough performance (Meaning faster than 99% of over programming languages already).

I really don't get this argument. If you don't need pedal-to-the-metal performance then why are you using C++ in the first place? (Unless, of course, your answer is "legacy code".)

C++ is being touted as being high performance, but basically every standard data structure besides `std::vector` is garbage for high performance, pedal-to-the-metal code. And not only data structures - `std::regex`'s performance is bad, `std::unique_ptr` doesn't optimize as well as a plain pointers, no vendor has a best-in-class `std::hash` implementation (they're neither DoS-safe nor the fastest), etc.

> BTW, you will very likely have to do that in any language anyway. Because it is impossible to design fast forever-living data structure. They (almost) all become obsolete when architectures evolve.

Do you, though? Rust already replaced their standard hash map implementation with a completely different one which was faster, so it shows that it can be done.

adev_ · on April 27, 2020

> I really don't get this argument. If you don't need pedal-to-the-metal performance then why are you using C++ in the first place? (Unless, of course, your answer is "legacy code".)

There is many other things that pure compute-bound CPU performance that can drive you to a GC-less language like C++. Fine grain memory usage is one of them, latency control is an other.

> `std::regex`'s performance is bad. , `std::unique_ptr` doesn't optimize as well as a plain pointers

std::regex is not defendable, specially when there is much better implementation already available (https://github.com/hanickadot/compile-time-regular-expressio...)

However, do you have any bench / source for std::unique_ptr ? You are the first one I hear giving critics on it.

> Do you, though? Rust already replaced their standard hash map implementation with a completely different one which was faster, so it shows that it can be done.

Rust does not have 25 years of code base behind him. We can talk to that again when he gets 10 years more.

C++ can not afford to randomly break API on one of its core STL component just to hypothetically gain few bit of performance.

It can be done, but it should be done with a depreciation process and an alternative implementation, like it has been done for auto_ptr -> unique_ptr.

steveklabnik · on April 27, 2020

Rust didn't break the API of HashMap. The trick is that the STL decided that certain algorithmic requirements were a part of the interface. For std::map, the case here is that it must be sorted. Its API also means you can't be polymorphic over a hashing strategy. This constrains possible valid algorithms. Rust made the opposite choices.

And so they did exactly what you say, for that reason. std::unordered_map is a better comparison.

aw1621107 · on April 27, 2020

> However, do you have any bench / source for std::unique_ptr ? You are the first one I hear giving critics on it.

Chandler Carruth talks about this at CppCon 2019 [0]. From a quick review he says part of the reason std::unique_ptr isn't zero-cost right now is:

- Objects passed by value (like std::unique_ptr) are passed on the stack instead of in registers, and changing that will be an ABI break

- No destructive move means extra temporaries will be created/retained

[0]: https://youtu.be/rHIkrotSwcc?t=1046

dataangel · on April 26, 2020

> Exceptions complicate mostly everything, so the best thing to do is to turn them off and not rely on anything that uses them, for example.

Look forward to all your users ignoring your return codes and never calling valid() on your objects that can only signal construction failed that way.

Also the impact of exceptions on performance is overblown, even in high perf situations:

https://news.ycombinator.com/item?id=20342183

Throwing is slow, but throwing is supposed to be rare. Don't use it as flow control.

pjmlp · on April 24, 2020

About 1% of developers actually need to deliver µs fast code.

The remaining 99% are happy to be able to deliver fast enough portable code without having to reinvent data structures all the time.

Leherenn · on April 24, 2020

Array is great as far as I know.

kllrnohj · on April 24, 2020

std::array is great except it's annoying to statically initialize. Really wish make_array or to_array were non-experimental a lot sooner (to_array finally did land in C++20 at least).

But yeah it does mean you can finally stop doing that silly `sizeof(a)/sizeof(a[0]);` trick.

tom_mellior · on April 24, 2020

Thanks for the reference. But this is still specific to access patterns: You can use a deque as a queue without iterating, so iterators and their invalidation constraints should not matter.