How is it that programmers are still discovering state-of-the-art algorithms? I ...

wongarsu · on April 21, 2022

The field of computer science is at best 180 years old, and the first compiler is a mere 70 years old. Based on the history of every other field of science it seems safe to assume that the majority of notable algorithms are still undiscovered.

And that's ignoring that "optimal" is a shifting target. What's optimal on a 1960s computer where RAM is as fast as registers may not be optimal on a modern CPU with instruction-level parallelism and RAM that's orders of magnitude slower than registers.

throwaway894345 · on April 21, 2022

I'm of the impression that computer science isn't considered a "science" field despite the name. Anyway, stuff like this is largely just math; other science fields often move slower because you're at the mercy of natural experiments and expensive instruments (e.g., waiting for some burst of gravitational waves to arrive from some black hole collision that happened billions of years ago).

orlp · on April 21, 2022

Computer Science is neither a science, nor is it about computers. It is an awful name. Computational mathematics would be significantly better.

The worst part is that most people pursuing a computer science degree actually want to learn about software engineering, so most curricula nowadays disappoint both the people that want to become software engineers ("why do I have to learn about formal grammar automata") and those that want to become computational mathematicians ("why do I have to do a course on agile?").

wongarsu · on April 21, 2022

Most of our day jobs are closer to engineering, but there is a very theoretical side of computer science, and algorithm design certainly fits there. As you said it has a lot of similarities to math, and there's no consensus on whether math is a science, so I won't comment on the "science-ness" of computer science.

But I can say that mathematics is an ancient "science" which has been studied by great scholars for millennia, but many of the things we take for granted today are relatively recent: modern set theory was introduced 150 years ago by Cantor and Dedekind, meaning things like Russel's Paradox [1], Gödel's Incompleteness Theorem [2] and the Peano Axioms [3] have all been discovered after that, despite being taught in many (university-level) introductory math courses today. Rings are also relatively young [4]. In comparison, imaginary numbers are about 500 years old.

1: https://brilliant.org/wiki/russells-paradox/

2: https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_...

3: https://en.wikipedia.org/wiki/Peano_axioms

4: https://en.wikipedia.org/wiki/Ring_(mathematics)

chriswarbo · on April 21, 2022

I think of "computer science" somewhat like "engine science", AKA thermodynamics: they're both intimately tied to practical engineering (hardware/software; steam locomotives) and they're both intimately tied to abstract mathematics (computability, complexity theory; statistics, information).

An easy way to reconcile this is to consider that everything is an engine (~ 'system that transforms energy'); and likewise that everything is a computer (~ a 'system that turns inputs to outputs'). Each field offers a different perspective, on different aspects of, the world :)

foobiekr · on April 21, 2022

To quote my CS professor, “You will notice that most of the algorithms we will cover, and most of those on the white book, are so short and simple that they fit on a single page and usually less. The real world code is longer, but there is a world out there of algorithms which probably aren’t this short and simple but we don’t know them yet.” (1995 or thereabouts)

If anything I think he missed a key component which is that algorithmic improvement is actually only interesting in a few very specific places.

I think pdqsort actually shows this to be true - it uses added complexity to improve on things that really do fit in a few lines of pseudocode.

chriswarbo · on April 21, 2022

There is definitely a trade-off between brevity and efficiency/practicality. A classic example is "The Fastest and Shortest Algorithm for All Well-Defined Problems":

> An algorithm M is described that solves any well-defined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and low-order additive terms

Spoiler alert, the "algorithm M" is a brute-force search; the "low-order additive terms" in its time complexity come from brute-force searching for provably faster programs; if one is found, we restart with that. All the time spent searching for proofs is independent of the input size (e.g. it depends on "sorting", but not which particular list we've been given); hence it's technically just a constant 'warmup time'!

http://hutter1.net/ai/pfastprg.htm

p1mrx · on April 21, 2022

Efficient O(n log n) sorting was "solved" by John von Neumann in the 1940s; everything since then is more a matter of engineering than science:

https://en.wikipedia.org/wiki/Merge_sort

nwallin · on April 21, 2022

> How is it that programmers are still discovering state-of-the-art algorithms? I just assumed Einstein-tier people had discovered all the optimal ways to do things by now.

Because it's a moving target. There isn't a fixed 'optimal' sorting algorithm.

We've had worst case O(n log n) sorting algorithms for decades, since at least the '50s, possibly the '40s. pdqsort isn't more 'optimal' in the algorithmic sense than merge sort is, they're both worst/expected O(n log n) algorithms.

If RAM is low-latency and your CPU isn't super scalar, heapsort is going to take less time than quicksort. With high-latency RAM and speculative executing CPUs, quicksort is going to take less time than heapsort. So every few hardware generations we need to sit down and benchmark everything.

pdqsort is based on the principal that the real world is different than the academic world. There are no frictionless spherical cows in a vacuum. In academic research, we assume that the data has been contrived to break your algorithm, or it's randomly distributed. In the real world, data is almost never randomly distributed. A lot of data that gets plugged into your standard library of choice's sorting algorithm is already sorted. Or is sorted in reversed. Or is sorted with one random element plopped onto the end. Or is sorted with one element randomly in the middle that's out of place. pdqsort is designed to account for that.

jandrewrogers · on April 21, 2022

> I just assumed Einstein-tier people had discovered all the optimal ways to do things by now.

This is the answer to your question, almost everyone makes the same assumption you do.

There are a surprising number of problems in computer science that no one works on because everyone assumes that other clever people are. Relatedly, people assume the set of possible solutions has been exhaustively explored because there have been no incremental improvements in decades. Or people assume the scope of optimality of a well-known solution is much broader than it actually is.

The vast majority of people in software never test or revisit these assumptions. If you do test these assumptions, you will be surprised by how often you find low-hanging fruit.

gameswithgo · on April 21, 2022

In part the optimum algorithm changes as hardware changes. In the 1980s for instance the CPU was about as fast as a ram access. Today the CPU is on the order of 200 times quicker than a RAM access, which means algorithms need to be designed to work well with CPU caches to perform well. There is also much MORE memory available, so in some cases it makes much more sense now to consume more memory to get more speed. Computers often have many cores now as well, also changing the equation at times.