The Most Important Algorithms

dododo · on July 5, 2010

quite a lot of structure amongst these algorithms that the list misses out:

- dynamic programming comes from solving Bellman's equation and Q-learning is an approximate means of doing dynamic programming.

- It's possible to do beam a* search, where beam search, a* search, and best-first (greedy) search are special cases.

- in the continuous domain, greedy search is essentially gradient ascent.

- you can use discrete differentiation to find the gradients for gradient ascent (actually, i would replace the first order method suggestion of this list with newton-raphson or some quasi-newton raphson method like BFGS).

- EM ties some bits together.

- if you replace the max of the viterbi algorithm with a summation, then you get the sum-product algorithm which is essential in the E step of EM. if you use viterbi instead of sum-product, you get something known as zero-temperature EM, which is an approximate form of EM

- the M step of EM is typically just gradient ascent!

- you can use max flow to do a similar thing to viterbi on certain graphs

and so on...

iskander · on July 6, 2010

1) Just do EM with parameters in an arbitrary semiring and you generalize nicely to all the special cases you mentioned (and more)

2) I see how all of EM is coordinate ascent, but I don't see how the M step is gradient ascent. Where the gradients at?

3) Second-order optimization methods can be pretty tricky and fall apart for large problems. Stochastic gradient descent ftw.

jules · on July 5, 2010

I think the dynamic programming meant here is more general. They mean saving results of a recursively formulated algorithm in a table so that you don't have to recompute them. This can be used for example for finding longest common subsequences, but it is much more general.

Can you elaborate on the general search procedure?

Is there a good resource that ties algorithms together like you've done here, but less concise?

dododo · on July 5, 2010

they do mean dynamic programming in a more general sense: but dynamic programming was invented originally to solve Bellman's equation. it turns out many other problems have a similar structure which is quite surprising! before bellman's seminal work in 1940s, it wasn't known how to efficiently these problems. indeed, you can often do something like Q-learning for finding approximate solutions to dynamic programming problems even faster.

the general search procedure is just A* search but with the beam search pruning each time you expand the fringe of exploration.

i don't know a good source, i picked this up from a bunch of courses/research.

jules · on July 5, 2010

Thanks! You should consider writing an article about this :)

jefffoster · on July 6, 2010

The search chapter in PAIP has an excellent description of A* search, beam search and best first search that shows the relationship between the algorithms. The code is available on-line (http://norvig.com/paip/search.lisp).

nostrademons · on July 6, 2010

This reads like a textbook, i.e. it's algorithms that are highly useful to CS professors and students but not necessarily to practitioners. If I had to pick a top-10 list based on my professional experience, it would be:

1. Tree traversal. This comes up all the time, from directory walks to DOM traversal to code analysis to folds over abstract data types. It's also something that you need to know and can't always rely on having a library function for, because many things exhibit tree-like structure without actually being instances of your language's Tree data type.

2. Hashing. Obviously as a base for hashtables (which are often your dictionary data structure of choice), but also as a general technique for generating a small fingerprint from a large set of data.

3. Statistics - mean/median/mode, but also things like confidence intervals, regressions, sample sizes, how to make sure your populations unbiased, etc.. You are evaluating your product, right? Knowing how to make good inferences from data is critical, particularly since there're lots of ways you can do it wrong.

4. Sorting. This can usually be hidden behind a library function, but it's useful to understand because many other algorithms have very different (and often better) performance characteristics if the input data is kept sorted. Binary search falls into this bullet point too.

5. Data compression algorithms. Another one that you'll rarely have to directly implement, but knowing their characteristics helps you make good speed/space tradeoffs for the rest of the system.

6. Bloom filters. This is one that I never learned in school but tends to be incredibly useful when dealing with massive data sets. Oftentimes, you want to be able to quickly reject elements if they're not in a set, but don't really care about it taking a while if they are in a set (because you expect that many more elements will be outside of the set than inside it). A Bloom filter is one of the fastest, most space-efficient ways to do this.

7. Topological sort. Dependency graphs come up all the time in practical programming problems, and usually you want a way to linearize them into a list of tasks that you can perform in order. It's also a shame that many languages' standard libraries don't have a topological sort function, so oftentimes you will have to implement this one yourself.

8. Support vector machines. These are your bread & butter machine-learning classifiers, and are really nice to know when you've got a bunch of data and want to try to automate some rote classification job.

9. Physics simulations. I've found it surprising how useful my intro physics course knowledge has been. Things like vectors, position vs. velocity vs. acceleration, damped harmonic oscillators, FFTs, etc. It's most useful when building UIs and games - oftentimes, you can make a UI feel significantly more natural by adding easing functions that behave like acceleration when you ease in and frictional damping when you ease out.

10. Unification. I'm probably biased because a lot of my hobby programming is in compilers & type systems, but unification comes up all the time in compilers, and is a really general technique for equation solving.

sb · on July 6, 2010

Nicely observed, but I have two just for the records (again, I know):

i.) this is a list of/for CS professors; RISC expands to Research Institute for Symbolic Computation which is a CS/Math department of Prof. Buchberger who discovered the famous Groebner bases.

ii.) AFAIR (too many years gone since), the A* algorithm is pretty much a generalization for graph traversal. I remember my professor showing us that how by either prepending or appending nodes to visit to a list either gives depth-first or breadth-first traversal. So I think in a way at least both lists agree on the most important algorithm...

edit: obviously I forgot to state that a tree is a graph, too.

nostrademons · on July 6, 2010

ii.) I'd noticed a while ago that all graph traversal algorithms follow the same pattern. There is a white set of undiscovered nodes, a grey set of discovered-but-unvisited nodes, and a black set of visited nodes. To execute the graph traversal, you push a start node into the grey set, and then recursively pop a node off the grey set, add any children of it that are currently in the white set to the grey set, and repeat until everything is in the black set. The structure of the grey set determines the particular traversal algorithm:

1. If it is a stack, then you have a depth-first search, which visits all children first and then moves on to siblings.

2. If it is a queue, then you have a breadth-first search, which visits all siblings first and then starts recursing into children.

3. If it is a priority queue keyed by edge weights, then you have Dijkstra's algorithm, which will find the shortest path to any node in the graph.

4. If it is a priority queue keyed by in-degree of nodes, then you have a topological sort.

5. If it is a priority queue keyed by the lowest edge weight such that the endpoint of the edge is in the black set, then you have Prim's algorithm for minimum spanning trees.

You can have more exotic data structures too, eg.:

6. If it is a linked-list encoding of a stack done through pointer reversal, you have a mark & sweep garbage collector.

7. If it is a test on whether the pointer points to from-space or to-space, you have a Cheney-style breadth-first copying garbage collector.

I'd never seen a formalism that described this - it was just something I figured out when I was implementing my 3rd or so topological sort. I assumed that something like that must exist though - the notion of white/grey/black sets for graph traversal is well established in at least the garbage collection literature, and from there it's a short leap to figure out what structure of the grey set corresponds to which graph algorithm.

I just took a quick look at the A* entry in Wikipedia - I've heard of it but never had a reason to implement it myself - but it seems like it's a generalization of Dijkstra's algorithm where edge weights can be arbitrary functions. In this case, it's the same as case #3, but the keys to the priority queue are a function of the node instead of a straight list of edge weights.

sb · on July 6, 2010

Very nice presentation!

In A* if you use a heuristic to select an edge (using a priority queue, i.e., your #3) it gives a best-first search. But it is is possible to just use a stack/queue an forget about the heuristic, then you get a nice generic way of tree (graph) traversal.

Regarding your use of white/black/grey lists: in a tree (which is acylic), you would only need a list of nodes to visit, however, in a general graph you need to keep a list of nodes that you have already seen/visited so as not to get into a cycle when the graph is cyclic. In tracing garbage collection algorithms this is often used because the live program data generated by the mutator is cyclic, or at least potentially can be. Thanks for mentioning this, though I know my way around gc stuff, too, I would have never come up with this link here! Probably it is a good advice to anybody interested to spend some time reading into garbage collection algorithms, they contain many interesting algorithms on graphs, such as the Deutsch-Schorr-Waite algorithm for tree traversing without requiring an additional stack. The definitive book on garbage collection is 1996's Jones and Lins: http://www.amazon.com/Garbage-Collection-Algorithms-Automati...

igravious · on July 6, 2010

Hi nostrademons, do you mind if I put this observation onto Wikipedia if it is not already there? I would add the cyclic/acyclic observation of sb's reply also. http://en.wikipedia.org/wiki/Graph_traversal is just a stub, it could nearly go there.

nostrademons · on July 6, 2010

I don't mind, but I'm guessing it'll get tagged with [citation needed], and I don't know what an appropriate citation for it is.

gadtfly · on July 6, 2010

The extent of my experience on the topic is in half-comprehending hobbyist's awe and the inclusion of okmij on my "personal heroes and demigods" list. The only word I could half-heartedly associate with this structure is the incredibly lame "hylomorphism"... which actually kinda generally expresses the pattern of much of computing. It's 5 AM and I'm going to sleep, but I'll leave you with the the most plausible leaves of a very brief search towards finding some recognized formal name for this structure:

http://blog.sigfpe.com/2009/07/monad-for-combinatorial-searc... http://spivey.oriel.ox.ac.uk/mike/search-jfp.pdf http://www.cs.au.dk/~gerth/papers/jfp96.pdf

If you really don't like yourself, try digging here:

http://hackage.haskell.org/packages/archive/pointless-haskel... http://comonad.com/reader/2009/recursion-schemes/

Retric · on July 6, 2010

It's a nice theory but there is a second class which basically consists of two shades of White, Grey, Black, and Red.

Let's say you’re building a system to quickly find the degree of separation between two people on a network. For simplicity assume each node connects to 4 random other nodes. If you start from one end and do a breadth first search you would need to try ~(4^N) nodes to find the link. However, if you start from both ends and do a breadth first search you can try ~2 * (4^(N/2)) nodes to find that same connection.

arethuza · on July 6, 2010

I'd add some crypto algorithms to that list: cryto hashing (as someone noted below, not every hash is a crypto hash), symmetric and asymmetric encryption, digital signing.

Diffie-Hellman key exchange as well as it is so simple and so wonderfully counter-intuitive.

kiba · on July 6, 2010

Physics simulations. I've found it surprising how useful my intro physics course knowledge has been. Things like vectors, position vs. velocity vs. acceleration, damped harmonic oscillators, FFTs, etc. It's most useful when building UIs and games - oftentimes, you can make a UI feel significantly more natural by adding easing functions that behave like acceleration when you ease in and frictional damping when you ease out.

Any good physics book that I could look into? Preferably books that integrate physics with programming?

nostrademons · on July 6, 2010

I believe my course used Halliday, Resnick, and Walker, but at the intro level, basically anything should do.

I wasn't even really thinking about the integration with programming - the parts I've used have mostly been just knowing what the basic equations were, along with representations like vectors and such. If you have a function of t, then all you need to do to simulate it is advance t by some infinitesimal amount (based on the frame rate, usually) and then compute your new positions. I guess that when you get to more complex simulations, then things like fast matrix multiplication and symbolic differentiation may be useful, but I've never actually used them in my own programming.

signa11 · on July 6, 2010

> Any good physics book that I could look into? Preferably books that integrate physics with programming?

any game physics book would do that for you e.g. http://www.amazon.com/Game-Physics-Interactive-3d-Technology... also there is structure-and-interpretation-of-classical-mechanics (yes, by the same cohort)

CountHackulus · on July 6, 2010

It's a good list, but there's been a few articles on here that talked about hashing and other security-related algorithms. Generally speaking, if you're implementing one, you're doing it wrong. Of course knowing about them and how they work is essential.

nostrademons · on July 6, 2010

The same goes for a bunch of things on my list, eg. sorting, compression, and SVMs. I don't think I've ever had to implement a sort algorithm outside of coursework or an interview context.

I included them because understanding how they work can still be pretty critical to using them effectively. I'll never have to implement a sorting algorithm. I do, however, have to understand that a best-case comparison sort is O(log N), and that if I just want to scan out the top-10 items in a long list, I'm better off with a linear scan (O(N)). I need to know that keeping data in sorted order will let me binary-search on it, for O(log N) access, and that cache & memory hierarchy effects often mean that this is faster than the O(k) access I might get with a prefix trie. I need to know that passing nearly-sorted data to QuickSort, in the absence of something like median-of-3 partitioning, can result in pretty pathological runtimes. I need to know that QuickSort can be done in-place, but MergeSort requires O(N) additional space. OTOH, MergeSort can be done efficiently in situations where mutation and random access are not allowed (eg. tape drives, functional languages), which other sorts don't do so well. I need to know what a stable sort is (very important for UI programmers!) and which algorithms have that property. I need to know that passing the first 80% of sorted input to a statistical or machine-learning algorithm and then saving the last 20% for your test set will give you pretty odd results. :-)

Hashing is similar: you don't want to implement it yourself. You do want to understand the various approaches and their limitations.

Daniel_Newby · on July 6, 2010

Not all hash functions are cryptographic hash functions.

j_baker · on July 6, 2010

Nice list. I would probably generalize #1 into graph traversal though.

nostrademons · on July 6, 2010

Probably yes. I figure that trees come up so often that tree traversal on its own would be #1.

I did add topological sort to the list after remembering just how many times (and for different domains!) I've had to implement one, but other graph traversal algorithms like BFS/DFS are quite useful too.

miguelpais · on July 6, 2010

A* is not just that. The way it is said seems to imply that it is equal to Best First Search. Best First Search only relies on the heuristic function, so for the following graph, where A is the start node and h is the heuristic function:

+---A---+

|...........|

B h:1....E h:4

|

C h:2

|

D h:3

The traversal order according to Best First Search would be:

{A, B, C, D, E}

While A* will search by g(x) = h(x) plus d(x), the latter being the distance function from the start note to the given node. So, assuming the distance function to be equal to the depth of the node in the tree it would traverse it this way:

{A, B, C, E, D}

g(D) = h(D) + d(D) = 3 + 3 while g(E) = h(E) + d(E) = 4 + 1 = 5, so E goes first.

SlyShy · on July 5, 2010

Thanks for posting this, I have a bunch of algorithms to learn. :)

mahmud · on July 5, 2010

Instead of going by a checklist, try to approach it by domain: Searching; Sorting; floating point, integer computing, and bit-manipulation; numerical analysis and computation; DSP; optimization and dynamic-programming; information theoretic stuff like compression and encryption; graph theoretic algorithms; symbolic algebra; geometric and hierarchic data structures and algorithms; statistical, probabilistic and inferential algorithms; string and sequence processing along with linguistic techniques, etc.

The whole point of big encyclopedic texts like Cormen et al. is to wet your feet and give you a broad exposure to various techniques. That way you have a small idea on what you want to use next, and you know which domain to focus your research.

My one recommendation is to ditch programming languages with huge boot-times when playing with algorithms. You want something that you can interact with live and see results; in that regard, even a symbolic algebra system like Maxima would be better than C, C++ and Java. Lisp and Python would be ideal for most algorithms.

SlyShy · on July 5, 2010

That's my usual approach, so the algorithms outside the domains I've studied were unfamiliar. Would you be able to recommend books in graph theory, and statistics/probability? I'd be much obliged.

dododo · on July 5, 2010

this is a splendid book that covers the most probabilistic/statistical algorithms: http://www.inference.phy.cam.ac.uk/mackay/itila/

SlyShy · on July 6, 2010

Thank you!

johnfn · on July 6, 2010

I've been trying to learn algorithms through this not-so-well known website: train.usaco.org

It gives you questions that you write programs to solve, then judges your programs against test cases. It's fantastic. The questions are difficult, but never impossible, and they drill you well on diverse algorithms, building on previously learned concepts, so that you really understand the material.

iamanet · on July 5, 2010

I am going through Cormen et al. with the objective of getting my feet wet but the whole process is painfully slow. I thought it would be nice to have at least some familiarity on the widely used algorithms across the globe. However, you are right on ditching programming languages with huge boot times. I am happily trying out my algorithms in Python.

sb · on July 6, 2010

CLR is an extremely well written algorithms textbook, but I use it more as a reference than for self-study. My first algorithms book was Sedgewick's Algorithms (where all algorithms were presented in Pascal), which is a very good algorithms text that is much lighter on several aspects.

Recently, however, I came across the following gem: Algorihtms + Data Structures = Programs, by Niklaus Wirth. (an Oberon version of 1994 is available for free: http://www.oberon.ethz.ch/WirthPubl/AD.pdf). I think this is hands down one of the best algorithm books. It amazes me how much content Niklaus Wirth is able to present in concise, yet crystal clear writing. Besides the usual algorithms, he includes very interesting applications that probably no other book does: using the partitioning of QuickSort to find the median (pg. 56 in above PDF, Section 2.3.4), based on algorithm by C.A.R. Hoare, and an in-depth discussion of polyphase sort (pg. 70, Section 2.4.4), which might be interesting for heavily distributed sorting (that at least what I imagined as a possible application when I recently re-read parts of it).

iamanet · on July 6, 2010

Thanks. BTW, here is the correct link for the book that you referred to on your comment - http://www-old.oberon.ethz.ch/WirthPubl/AD.pdf.

sb · on July 6, 2010

Thanks for the correction!

mahmud · on July 6, 2010

A fellow Oberon fan! Wirth has a rare talent for language design, and I cut my compiler hacking teeth on Oberon, thanks to his tiny booklet on the language and its implementation.

sb · on July 7, 2010

I think his talent for language design is undisputed. But his talent for writing easily equals his talent in programming, e.g., I never came across any work of Wirth that was not well written and contained interesting insights. A couple of years back, I came across the very interesting paper "Good Ideas, Through the Looking Glass" (2005) http://www.cs.inf.ethz.ch/~wirth/Articles/GoodIdeas_origFig.... which is probably not as Wirth-esque as his books, but still an interesting review of what happend in our discipline, looking back.

tptacek · on July 6, 2010

Strongly recommend Skiena over Cormen, for what it's worth.

igravious · on July 6, 2010

Link for both please, and why? :)

keefe · on July 6, 2010

Including dynamic programming is kind of like including divide and conquer.

nice list I'd also nominate :

support vector machines, back prop neural net training, delaunay triangulation, floyd-warshall, kruskal's mst, newton's method, edit distance algorithm, huffman coding or something with compression (I'd vote zip), some kind of reduction algorithm and we can't forget simulated annealing

rayval · on July 6, 2010

Good list.

I would include some more pragmatic algorithms, such as Frank Liang's hyphenation algorithm (PhD thesis at Stanford in 1983) http://www.tug.org/docs/liang/

Also De Casteljau's algorithm for Bezier curves from 1959 and deBoor's algorithm for B-splines.

The above are the ones that I have had to implement in the past as part of my work. There is a whole bunch of other algorithms in computer graphics that are equally (or more) important, which many of us rely on but fortunately don't have to implement.

Oh, another classic, pragmatic algorithm is John Carmack's implementation of BSP-based pseudo-3D rendering for the Doom game engine in 1993. See http://doom.wikia.com/wiki/Doom_rendering_engine

j_baker · on July 6, 2010

Erm... How did this list include the merge sort and heap sort, but not the quicksort?

beagle3 · on July 6, 2010

I don't agree with that list, but I think quicksort is given way more weight than it deserves. I guess it's a desired result of whomever named it "quicksort".

It's not particularly quick, has horrible worst case guarantees (which, if you care to improve, would make it slower still and very complicated), and is easy to get wrong on many accounts (repeating elements; already sorted input; unlimited recursion depth).

heapsort is simpler than quicksort, and has the best worst-case complexity you can get. It's not stable, but then neither are most quicksorts.

quicksort has its place, but it gets a lot more attention than it deserves.

mhansen · on July 6, 2010

I agree, but you have to jump all over your data when you do a heapsort, blowing your cache.

It has nice worst-case guarantees, but a well-implemented quicksort has good cache locality.

_delirium · on July 6, 2010

The main advantage to quicksort is that it's still the fastest average-case-in-practice of the common algorithms. If you want an O(nlogn) worst-case, introsort combines quicksort, which is fast on average but O(n^2) worst-case, with a fall-back to an O(nlogn) heapsort, with a guarantee that it'll never take more than O(nlogn) before it falls back, making it still worst-case no worse than O(nlogn). It'll of course be a constant-factor slower in those fall-back cases than if you had just used heapsort or something directly, but it'll asymptotically be no worse, and on average will get you the nice quicksort win.

Link: http://en.wikipedia.org/wiki/Introsort

beagle3 · on July 6, 2010

> The main advantage to quicksort is that it's still the fastest average-case-in-practice of the common algorithms.

I don't think that's true if you average over the available distribution of _implementations_ in the wide. Almost all implementations do one to three of the "crimes" I mentioned above. And frankly, I've met too much real world cases that triggered an n^2 behaviour on an existing implementation to ever settle for anything with worse than n*log n worst case.

If you're looking for the fastest-in-both-theory-and-practice algorithm, that's TimSort, but it's not in place. The places _in practice_ where Quicksort is the right answer are few and far between. I haven't seen any in the last 10 years.

known · on July 6, 2010

http://www.itl.nist.gov/div897/sqg/dads/terms.html

mfukar · on July 6, 2010

Since when is dynamic programming considered an algorithm?

Herring · on July 7, 2010

pfptw2m