One of the big ideas in the Standard Template Library for C++ was that it should...

citizen_friend · 2024-08-01T19:09:18.000000Z

I would disagree. The purpose of big O in the STL is not to allow you to compare different algorithms, but allow you to reason about how that will scale with input size. If I test perf with a size of 1000 and then it’s declared to be linear, I have a pretty good idea if that’s the right tool for the job.

Do you have any examples of “better” algorithms that the STL is not able to use? Another idea in STL is to provide many algorithms, which is why there are 3 sorts.

tialaramex · 2024-08-01T23:13:38.000000Z

> Another idea in STL is to provide many algorithms, which is why there are 3 sorts.

Where do you see three sorts with different behaviour? I see two, an unstable sort just named "sort" and a stable sort.

sort performs O(N⋅log(N)) comparisons and so does the stable sort, although to be fair it also promises to work (more slowly, O(N⋅log*2(N)) instead) if there's no scratch space.

We don't get to ask for the no scratch variant, so we can't count that separately.

And these big-O values are totally uninteresting because they're typical for every modern sort, when people announce a shiny new general purpose sort it has the same complexity numbers. Does your STL have it? Who knows, the standard doesn't specify.

And that gets back to my earlier point, Introsort, the typical sort you'd find in an C++ program today, is relatively modern, and C++ was standardised in 1998. So actually quicksort was often in a "complying" C++ stdlib because hey, that's good right? Well, no, Quicksort's worst case has O(n*2) so it's actually forbidden by the STL's documentation, but nevertheless it was shipped because for many real world uses it's fast.

citizen_friend · 2024-08-02T02:58:18.000000Z

> Where do you see three sorts

stable_sort, sort, and sort_heap

The creator Alex Stepanov also included insertion_sort (used in sort) but it was rejected by standard committee.

That suggests that the idea that complexity is primarily to compare is wrong, because then why would anyone pick insertion_sort? Of course there are real world cases where it is the right choice. But if you need the complexity guarantee, then it's not.

> shiny new general purpose sort

I don't want it in the standard until it's proven useful and has some longevity, not just a micro benchmark. Shiny and new are exactly the wrong adjectives.

Why can't you include a header?

> quicksort

Introsort is simply a variant of quick sort with heap sort as a fallback to avoid the worst case, and insertion sort at the lowest level.

Anyone who tried to pass off naiive quick sort simply wasn't educated about the topic, so it's good that they were not standard compliant.

tialaramex · 2024-08-02T10:13:47.000000Z

I had never seen std::sort_heap before, that is kinda cool.

I'm torn about whether this constitutes a sort though. Like, yeah, in some sense this is the meat of a heap sort, but, it's the caller's job to provide the data as a heap first and of course this is C++ so if we screw that up it's Undefined Behaviour.

> I don't want it in the standard until it's proven useful and has some longevity

I wasn't advocating for such things to be in the standard but to be in the implementation. One of the worst sins of WG21 has been standardising implementations, as they did with std::unordered_map and continued to do in newer C++ versions.

And you're wrong, these people knew exactly what they were doing. You've probably used a stdlib which did this (such as libc++), the reason was very simple: Despite the big-O complexity, in practice quicksort was fast while heap sort, though compliant, is usually slower. Benchmarks matter. Big-O not so much.

Like I said, Introsort was new, and as you've explained you want to wait until things have "some longevity" before standardising them, so the stdlibs didn't take Introsort at first. LLVM took Introsort for libc++ in 2021, Microsoft landed a compliant sort some years earlier. Neither complied with C++ 98 initially.

I mean the big joke for LLVM is that the bug ticket about their sort complexity which sat in a pile unresolved until after Joe Biden became president was by Orson Peters. It's like realising the guy who pointed out that your physics textbook has the formula for mass-energy equivalence wrong is Albert Einstein. Gee, I wonder if Orson has any idea for how to fix this libc++ bug...

jcranmer · 2024-08-01T19:25:41.000000Z

The most infamous case is that std::unordered_map is essentially required to be a bucket-based hashtable, whereas it's usually better to have a probe-based hashtable instead.

tialaramex · 2024-08-01T23:32:51.000000Z

The thing that's fascinating for std::unordered_map is that it wasn't standardised in C++ 98. These hash tables are a C++ 11 feature.

They feel like a 1980s feature, if there was a hash table in C's stdlib, this is the hash table you'd expect. But it's not in C and it wasn't from the 1980s, it's from the same year Osama Bin Laden was killed by US special forces.

Rust existed at the same time this data structure was standardised. Not Rust 1.0, but that's insane, it's like when you realise millions of older Spanish people were alive before their country was a democracy. Many are old enough to remember the prior regime.

jcranmer · 2024-08-02T00:50:36.000000Z

Well, if you want to be pedantic, std::unordered_map first appears in the C++ working draft in 2006 (N2009), which is before Rust was first publicly announced.

tialaramex · 2024-08-02T08:42:41.000000Z

I do want to be pedantic, but not in a fair way :)

The C++ 11 standard was not required to standardize broken garbage from many years ago.

WG21 pulled C++ 0x Concepts (a much more powerful feature than the "Concepts Lite" which landed in C++ 20) because they argued it might take too long to implement, ridiculous as that claim is, so there's no reason to keep something terrible just because it's in a draft paper.

citizen_friend · 2024-08-02T02:59:06.000000Z

Hash maps are extremely difficult to make general which is probably why they were never included in C++98. unordered_map is a disaster.

tialaramex · 2024-08-02T08:40:03.000000Z

It's extremely hard, so with 13 extra years they were able to make an astoundingly bad job?

It's hard to even believe this was trying to be general.

Notice that not only the hash tables themselves, but the infrastructure is garbage too. In 1998 C++ STL is one of the earliest generic programming APIs, nobody knows how you should implement the generality of something like hashing, fine.

But by 2011 people had tried several things, and this (the std::hash templated callable which returns a machine-word integer) is not even a reasonable attempt at a known good design.