Hacker News new | past | comments | ask | show | jobs | submit login

> LLVM history: Back then we recognized some really interesting benchmarks and we didn’t recall anybody trying to really benchmark sorts on different data patterns for standard libraries.

Timsort https://en.wikipedia.org/wiki/Timsort :

> In the worst case, Timsort takes O(n log n) comparisons to sort an array of n elements. In the best case, which occurs when the input is already sorted, it runs in linear time, meaning that it is an adaptive sorting algorithm. [3]

> It is advantageous over Quicksort for sorting object references or pointers because these require expensive memory indirection to access data and perform comparisons and Quicksort's cache coherence benefits are greatly reduced. [...]

> Timsort has been Python's standard sorting algorithm since version 2.3 [~2002]. It is also used to sort arrays of non-primitive type in Java SE 7,[4] on the Android platform,[5] in GNU Octave,[6] on V8,[7] Swift,[8] and Rust.[9]

Sorting algorithm > Comparison of algorithms https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_o...

Schwartzian transform https://en.wikipedia.org/wiki/Schwartzian_transform :

> In Python 2.4 and above, both the sorted() function and the in-place list.sort() method take a key= parameter that allows the user to provide a "key function" (like foo in the examples above). In Python 3 and above, use of the key function is the only way to specify a custom sort order (the previously supported cmp= parameter that allowed the user to provide a "comparison function" was removed). Before Python 2.4, developers would use the lisp-originated decorate–sort–undecorate (DSU) idiom,[2] usually by wrapping the objects in a (sortkey, object) tuple

Big-O Cheatsheet https://www.bigocheatsheet.com/

Quicksort in Python 7 ways (and many other languages) on RosettaCode: https://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Py...




Thanks, I'll remove the note about "recalling"


xeus-cling didn't exist back then: https://github.com/jupyter-xeus/xeus-cling#a-c-notebook

https://github.com/fffaraz/awesome-cpp#debug

A standard way to benchmark and chart {sorting algorithms, web framework benchmarks,} would be great.

The TechEmpower "framework overhead" benchmarks might have at least average case sorting in there somewhere: https://www.techempower.com/benchmarks/#section=data-r20&hw=...


awesome-algorithms https://github.com/tayllan/awesome-algorithms#github-librari...

awesome-theoretical-computer-science > Machine Learning Theory, Physics; Grover's; and surely something is faster than Timsort: https://github.com/mostafatouny/awesome-theoretical-computer...


Christ, the description on that second repo is so comically terribly spelled that I can’t figure out whether it’s a joke:

> The interdicplinary of Mathematics and Computer Science, Distinguisehed by its emphasis on mathemtical technique and rigour.

(Perhaps a witty satire on the reputation of mathmos for being godawful at anything literary...)


IIRC Rust and Swift have a modified timsort implementation to remove the "galloping" strategy.


Rust has "an adaptive, iterative merge sort inspired by timsort" to implement the stable sort() method if you have an allocator, and a pattern-defeating quicksort to implement unstable_sort() which is provided even if you don't have an allocator (no_std embedded environments).


I don't understand why you would need Timsort. If the pivot element is chosen randomly, Quicksort is always (or with probability bordering on certainty) O(n log n).

Update: I confused Timsort with IntroSort. I should have googled before posting ...


> I don't understand why you would need Timsort. If the pivot element is chosen randomly, Quicksort is always (or with probability bordering on certainty) O(n log n).

For starters, it's a stable sort, which excludes quicksort right out the gates.

Second, the primary hypothesis of timsort is that real-world data is non-random, timsort is biased for partially or almost-sorted sequences, which it can sort in as low as O(n). That would be the point westurner was making, timsort's creation is very much predicated on "really benchmark sorts on different data patterns".

Third, for very short sequences (also a common situation) it's an insertion sort, which is extremely efficient due to the low overhead (even more so combined with the adaptivity which insertion sort also has).


> For starters, it's a stable sort, which excludes quicksort right out the gates.

Negative, it excludes in-place quicksort but out-of-place quicksort can be stable just fine.

> Third, for very short sequences (also a common situation) it's an insertion sort, which is extremely efficient due to the low overhead (even more so combined with the adaptivity which insertion sort also has).

Every modern quicksort implementation also does this (or a similarly efficient smallsort).


Hi, you are ignoring the second point, and for the other two points you are referring to particular extensions of quicksort. Quite possibly you can extend quicksort similar to how timsort extended mergesort, but if that was commonplace nobody would care about timsort.


> and for the other two points you are referring to particular extensions of quicksort

Maybe for the third point, but once again, every implementation does this. For the first point, absolutely not. Just because in-place partitioning is often the only variant described, out-of-place partitioning is absolutely just quicksort, not an extension.

> Quite possibly you can extend quicksort similar to how timsort extended mergesort

Please see my top comment in this thread - I have done exactly that. Or rather, I have extended powersort (which is very similar to timsort) with quicksort.


I ported Timsort to the Windows kernel back in college. Good times!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: