Genetic Algorithms in CoffeeScript

jashkenas · on Sept 23, 2012

Direct link to the code, which is nice and clear to read (Genome / Population):

https://github.com/janmonschke/Genetic-Algorithms/tree/gh-pa...

SeanLuke · on Sept 23, 2012

Suggestion: TSP is a combinatorial optimization problem and isn't well suited to GAs. You would be much better off using a method meant for combinatorial optimization: notably ant colony optimization.

mck- · on Sept 23, 2012

I agree on this -- I wrote my thesis on the VRP (Vehicle Routing Problem) which is an even more difficult problem. I've studied GAs extensively (they are pretty awesome, hence my interest), but other methods are much better. I've heard great things about ACOs, but eventually I settled for Tabu Search, because:

- It is deterministic, hence not a black box - It is a local search, easy to explain - Does not require as many parameters to tweak (GAs' performance demands on how you set these: mutation rate, population size, cross-over mechanism (dozens of possibilities), stopping procedure, selection procedure, etc...) - Considered by literature to be one of the strongest class of algorithms for the class of TSP problems, in both solution quality and computational effort

Here's an open-source implementation in Common Lisp: https://github.com/mck-/Open-VRP

PeteBrighton · on Sept 23, 2012

If you don't mind expanding a little, I'd be interested to know why they're not suited to combinatorial optimization, and what they are in fact more suited to?

jules · on Sept 23, 2012

I have searched for a problem that they're more suited to, and I've come to the conclusion that GAs are in fact not known to work on any problem. They do "work" in the sense that sometimes they find an answer, but there are other algorithms that are much simpler and consistently outperform them (notably randomized hill climbing). Here is a paper that despite trying to prove the opposite, clearly shows that GAs are NOT a good way to solve any problem: http://web.cecs.pdx.edu/~mm/nips93.pdf

moconnor · on Sept 23, 2012

That's not quite my understanding of the paper's results: "As can be seen, the time to reach level one is comparable for the two algorithms, but the GA is much faster at reaching levels 2 and 3. Further, the GA discovers level 3 approximately twice as often as RMHC [randomized hill climbing]"

I don't dispute that there are generally better optimization algorithms than GAs, but this paper does present an artificial case in a GA outperforms hill climbing.

jules · on Sept 23, 2012

See my reply here for an explanation of how little that conclusion of the paper means: http://news.ycombinator.com/item?id=4561418

In particular they tuned the GA but not he hill climber, so it's not clear that GAs win even on this artificially designed problem.

Turing_Machine · on Sept 23, 2012

Wait a minute. First you were citing the paper's conclusion as supporting your position and now you're attacking the paper's conclusion?

jules · on Sept 23, 2012

I never cited the paper's conclusion as my position. What the paper shows is not what the paper concludes. The authors clearly didn't want to show that GAs are inferior, though inadvertently they certainly did. BTW, the authors of the paper conclude that further research is necessary to find cases where GAs outperform hill climbing.

api · on Sept 23, 2012

Most GAs are not a very faithful adaptation of actual biological evolution, either. The real evolutionary process is a bit different. Most notably, the "agent" actually implements the algorithm in the sense that it performs its own self-replication and implements the "operators" (crossover, etc.) within its own embodiment. This has certain implications, most notably that the algorithm itself is subject to evolution.

There are other big holes too. For example, few GAs implement anything resembling ontogeny or lifetime learning, and generally have a poor genotype/phenotype divide. That also has big implications. It's a big reason most GAs are far too "greedy" and get stuck at local maxima easily.

Turing_Machine · on Sept 23, 2012

"We then analyze an "idealized" genetic algorithm (IGA) that is signicantly faster than RMHC and that gives a lower bound for GA speed. We identify the features of the IGA that give rise to this speedup, and discuss how these features can be incorporated into a real GA."

"As can be seen, the time to reach level one is comparable for the two algorithms, but the GA is much faster at reaching levels 2 and 3. Further, the GA discovers level 3 approximately twice as often as RMHC."

"We have presented analyses of two algorithms, RMHC and the IGA, and have used the analyses to identify some general principles of when and how a genetic algorithm will out- perform hill climbing."

I'm not sure this paper says what you're claiming that it says.

jules · on Sept 23, 2012

Note that the "IGA" is not a real algorithm. Knowledge of the solution is encoded into the algorithm. It's no surprise that it outperforms an algorithm that does not have the privilege of knowing the solution before it starts.

Second, to get a result where a GA outperformed hill climbing they did the following:

1. They started with a problem that was DESIGNED to be very well suited to GAs and not so well suited to hill climbing. It turned out that when you do hill climbing in a non ridiculous way that GAs lose big time (by a factor of 10).

2. Through several steps they further modified the artificial problem to give a disadvantage to hill climbing and an advantage to GAs.

3. They tuned the GA's parameters and did not tune the hill climber's parameters.

4. They compared the performance by number of fitness function evaluations. This is unfair to hill climbing because GAs have bigger overheads elsewhere.

After these steps the GA outperformed hill climbing by about a factor of 2. So it is not clear that the GA would still win if you tuned the hill climber. Even if it did, this is a problem explicitly designed to give GAs an advantage. The fact that they had to go through so much effort to design such a problem doesn't instill much confidence that there exists a real world problem where GAs work.

I have tried to replicate their results and do the tuning of the hill climber, but unfortunately the paper is so vague on what the problem is that the algorithms are actually supposed to solve, so that I was not able to do this. If anybody knows a study of a problem (preferably real world) where GAs are shown to outperform reasonable forms of hill climbing I'd be very happy to hear it.

ramgorur · on Sept 24, 2012

"If anybody knows a study of a problem (preferably real world) where GAs are shown to outperform reasonable forms of hill climbing I'd be very happy to hear it."

GA (specially pseudo-boolean GA) is not a "Golden Algorihtm" that solves every problem as most people think, rather it is a "idea" that was pioneered by Hollad in the sixties, now it's sole purpose is to explain the theoretical aspects of other GA-offshoot Evolutionary Optimization techniques like GP(Genetic Programming), EA (Evolutionary Algorithm), DE(Differential Evolution), ES(Evolutionary Strategy) etc, etc.

I think GP has much more impact in attacking real world problems, GECCO Humies award list has many interesting results where RMHC's may not be suitable -- http://www.genetic-programming.org/combined.html

Turing_Machine · on Sept 23, 2012

"Even if it did, this is a problem explicitly designed to give GAs an advantage."

Well, yeah, you'd want to use GAs on problems for which they were well-suited. That's not at all the same as "clearly shows that GAs are NOT a good way to solve any problem".

jules · on Sept 23, 2012

Sure. The point is the effort they had to find such a problem. That indicates that it is very unlikely that any given practical problem is suitable for GAs. Again, if you are aware of any real problem where GAs work better than hill climbing, please do share. Note that this is not a very high bar. For example the same applied to quicksort vs insertion sort is "find any real example where quicksort outperforms insertion sort". If you couldn't find such an example and had to go to great lengths to artificially construct such an example then I'd call quicksort a failure, given that it's more complicated than insertion sort. Why GAs were/are as hyped as they are is a mystery to me.

janmonschke · on Sept 23, 2012

I've seen a lot GAs tuning parmeters for other algorithms. Someone already mentioned it here: http://news.ycombinator.org/item?id=4561167.

But I agree, that they are not a silver bullet for any kind of problem and for most problems there is almost always a way smarter and more efficient algorithm. I did not hold this presentation to show that GAs were the best solution for any problem. My intention was to teach sth. that I myself liked in university. Especially I thought that the way we were taught GAs in university was too complicated and complex.

photon137 · on Sept 23, 2012

A domain in which GAs are slowly finding resurgence is algorithmic portfolio management - I work in this area a bit (ie portfolio management - but not directly involved with the GA guys - plus their research is proprietary) - but their premise is that evolution has done pretty well surviving against nature and products of evolution itself.

The financial markets are a bit like that - stochastic exogenous factors (headline risk like the Euro crisis, the FOMC meeting decisions etc.) and evolved responses based on the principle of no-arbitrage (ie other traders) determine portfolio performance and trading decisions along with transaction costs.

But GAs as they exist are unsuitable for this, from what I've gathered from conversations with those quants. The modifications required appear to be in the direction of structured intra-generation genome modification (or adaptation) rather than pure randomness and crossover to produce agents with a specific trading behaviour (much in the same way as evolution allows a sensible change in the DNA to produce a particular protein suited for a task).

azakai · on Sept 23, 2012

> I have searched for a problem that they're more suited to, and I've come to the conclusion that GAs are in fact not known to work on any problem.

Yes, that's my opinion too. GAs model natural selection, which is extremely slow. The only reason it works in nature is the huge timescales and the lack of anything better (since it has to start from essentially nothing - there is no designer).

the_cat_kittles · on Sept 23, 2012

I've seen GA's work as a way to tune parameters on an SVM, with pretty good results.

mailshanx · on Sept 23, 2012

I would reserve GAs (at least in their naive form) as a last-resort option. They are extremely inefficient solutions because it doesn't exploit any structure in the problem.

Essentially, a GA solution consists of:

1. Select a random point(s) in the search space, hoping that you have arrived at a sufficiently good solution.

2. If none of the solutions are not good enough, generate a new set of points by combining the best available points and adding a bit of random error.

3. Repeat the above until time has run out/u've hit a good enough solution.

However having said that, GAs can be a good option if 1)not much is known about the function you are trying to optimize, or 2)the crossover/mutation functions are designed to reflect some problem structure, or 3)the search space is small enough.

janmonschke · on Sept 23, 2012

Thx for the hint :) The task originated from a task I had to do at university in Algorithms class as an introduction to GAs and I liked the simplicity of it.

mck- · on Sept 23, 2012

Everyone seems to claim a better way to do it (myself included).

Here's a TSP challenge for you to prove yourselves: http://www.tsp.gatech.edu/data/usa/index.html

Deadline is July 4th, 2013 -- Price $500 & Honourable mention for finding the shortest path to the world's largest TSP problem.

Turing_Machine · on Sept 23, 2012

Anyone who came up with a radically superior solution to that problem could use it to make significantly more than 500 bucks. :-)

domlebo70 · on Sept 23, 2012

Can't help this example could have been illustrated much more clearly in something other then CoffeeScript.

columbo · on Sept 23, 2012

The coffeescript rocks/sucks crowd does not belong here. This person wrote a blog post explaining something using a language he is familiar with.

If articles are boiled down to a shouting match because the java-haters or the coffeescript-haters or the ruby-haters found the article first we all may as well go back to /.

Destroyer661 · on Sept 23, 2012

Out of all the languages to choose from it just had to be coffescript...

janmonschke · on Sept 23, 2012

It's my main language for almost two years now ;)

notiontaxi · on Sept 23, 2012

First they ignore you, then they ridicule you, then they fight you...

...than u rule da web.

Destroyer661 · on Sept 23, 2012

I'm not saying that it being your main language is bad, it's just the syntax is absolutely terrible to read and understand for those of us on C-style languages. If the point was to teach others, the vast majority of your audience is not going to be reading coffeescript and learning much.

jeswin · on Sept 23, 2012

IMHO, I don't think it is fair to suggest this.

- There are GA tutorials available in several other languages, if this doesn't work for you. Btw, if a tutorial was offered in Lisp/Haskell, would we see the same suggestion?

- For Ruby/Python programmers, this isn't hard at all to read. Familiarity with non C-style languages might be something to pick up.

- Having moved to Coffee from C-style languages, I prefer coffee. But that is just my opinion.

I think HNers should be more tolerant of language choices. That argument can't really be won.

phpnode · on Sept 23, 2012

that's funny because coffeescript is meant to be all about readability in code. I wonder if you really tried to read this or if you just commented based on the title. If you did really read it, I wonder if you found it hard to read because your opinion is tainted by those C style languages? It's not scientific of course but I just got my non coder, non technical girlfriend to read through both the coffeescript and javascript versions of this and I definitely had to explain a lot less to her with the coffeescript version, she says it's easier to read because it's structured more like plain English.

jashkenas · on Sept 23, 2012

Hurrah. That's a big part of the entire premise.

tinco · on Sept 23, 2012

In all code examples you can press the down arrow to view the horrible (c-style) Javascript source.

The javascript seems to be mainly horrible because of the coffeescript code being object oriented and its lack of a more powerful looping construct.

ville · on Sept 23, 2012

It looks like there is something missing in the first example in Javascript.

Population = (function() { Population.prototype.genomes = []; })();

This throws a ReferenceError because Population is not defined.

Genome = (function() {

  Genome.prototype.values = [];

  function Genome(values) {
    this.values = values != null ? values : this.initial();
  }

  Genome.prototype.inital = function(){
    return [RANDOMARRAY FROM [0-14]]
  }

})();

After this part Genome will be undefined.

janmonschke · on Sept 23, 2012

Thx, forgot to copy the var statements. It's fixed now.

unix-dude · on Sept 23, 2012

Just ran through this quickly, I don't know too much about algorithms, so I can't judge GA's usefulness when compared to other methods, but it certainly seems like a cool/fun thing to implement.

the_cat_kittles · on Sept 23, 2012

This got me thinking- is there a use case for client-side ML?

dmix · on Sept 23, 2012

The better question is would you even want to do ML in javascript?

the_cat_kittles · on Sept 23, 2012

for client side ML! hence my question...

dmix · on Sept 23, 2012

Well, ML is almost always very CPU and memory intensive, I'm not sure how running any data-heavy algorithms would make sense in a browser.

Turing_Machine · on Sept 23, 2012

Just for clarification, you mean machine learning rather than ML the language, right?

(I ask because this discussion is talking about both the genetic algorithms and Coffeescript).

Hmm... maybe some sort of GPS application that took into account current road conditions -- length of time stopped at traffic lights, performance characteristics of the current vehicle (velocity and acceleration away from lights, etc).

At one time there was talk of making the air traffic control system distributed rather than centralized. The win there is that 1) each aircraft only has to worry about the other aircraft that near enough to interfere and 2) if the control system goes out in one aircraft, the systems in the other aircraft can compensate. By contrast the centralized system has to watch all the aircraft in a large chunk of airspace, and if it goes out, well...

This is sort of what Google is doing with their self-driving cars, I think (that's just my impression -- I haven't looked into it in detail).

cbrandolino_ · on Sept 23, 2012

That's supernice!

You may want to fix the links in the readme though.

janmonschke · on Sept 23, 2012

Thanks Claudio, I fixed them!

grendelt · on Sept 24, 2012

What did you/he use to make the presentation?

janmonschke · on Sept 24, 2012

I used reveal.js: http://lab.hakim.se/reveal-js/

abuiles · on Sept 23, 2012

Nice talk Jan!