Hacker News new | past | comments | ask | show | jobs | submit login
Multi-Task Learning in Atari Video Games with Emergent Tangled Program Graphs (acm.org)
210 points by sengork on July 18, 2017 | hide | past | favorite | 44 comments



I was a little surprised at the headline, since I expected 'outperforms' to mean that it had better end-results, which is of course not the case. GP is just much faster due to it's relative simplicity and the results are close enough to those achieved with NN and deep learning.

> Finally, while generally matching the skill level of controllers from neuro-evolution/deep learning, the genetic programming solutions evolved here are several orders of magnitude simpler, resulting in real-time operation at a fraction of the cost.

> Moreover, TPG solutions are particularly elegant, thus supporting real-time operation without specialized hardware

This is the key takeaway and yet another reminder to not make deep learning the hammer for all your fuzzy problems.


> I was a little surprised at the headline, since I expected 'outperforms' to mean that it had better end-results, which is of course not the case. GP is just much faster due to it's relative simplicity and the results are close enough to those achieved with NN and deep learning.

From figure 3 in the paper it seems like it outperforms DQN on all games but one. So, it has better end results as well.

Edit: There are other results linked in this thread that are better than the 2015 DQN results that the paper refers to.


One of the huge benefits of GPs over NNs is the ease of reverse engineering a GP tree compared to NN models. Its not effortless however. Its just not mathematically complex like NNs i.e. a programmer who isn't a mathematician can analyze GPs with a lot of patience

EDIT: I have found GPs to be relatively slow-to-very-slow. But very likely that is because of the lack of interest and development compared to NNs


Partly, I think they're also fundamentally slower than NN because you're manipulating and executing programs (ASTs[1]) while for NNs you just adjust some values.

I've dabbled in GP and I really like it but those ASTs can get huge if they're not carefully pruned and might not add to the solution at all.

[1] https://en.wikipedia.org/wiki/Abstract_syntax_tree


I don't think you can generalize it this way, ASTs could be compiled to fast machine code, also it really depends on the solutions the algorithms come up with. The NN is bound to its number of parameters, while the Genetic's program varies in length and can become quite small if length is part of the fitness function.


Well, the accrual of "useless code" (there's a name for this that I forgot") is a known problem, but it is also something that stabilizes the learning process

I don't think it's as simple as putting the length of the AST in the goal function (but it's something interesting to try).

Depending on compile speed vs running speed you might be better off interpreting your ASTs


It's bloat due to 'introns' (useless statements that don't effect the output, like x = x * 1). And yes, just adding a fitness function to shorten program length isn't optimal. I've found it easier to evolve successful programs (letting the bloat happen) and then keep removing statements from correctly generated programs whilst checking if the output is the same. Probably not optimal either but I feel like it gives better results.


I guess a lot of standard compiler optimisation techniques could be used here -- if you care.

I don't know what that would do to the learning process -- but at least it would be useful for end results.


That's kinda horrifying... Accruing useless code to reduce the probability of mutating the useful code? Sounds like a better regularization strategy is needed.

I'm generally pretty suspicious of generic algorithms; why take a random walk when you can March along the gradient towards a solution?

It might be interesting to try using GA for neural architecture, though, and gradient descent to train the network... (Though it sounds expensive.)


> why take a random walk when you can March along the gradient towards a solution?

Because your problem has no smooth/continuous gradient

Because your problem has a giant search space

Because your problem can do with a "close enough" solution

Try gradient descending a symbolic regression and we'll talk


"I have found GPs to be relatively slow-to-very-slow."

There are two different speeds one could measure in regards to GP's.

The first is the speed at which one evolves solutions. This can, in fact, be frightfully slow and eat up all the hardware you can throw at it (depending on how large your populations are, how fast your fitness function is, etc).

The second is the speed of the evolved solution. This may be slow, but doesn't have to be. In fact, the speed of the solution could be part of what's being evolved. So you could explicitly evolve something fast, if you wanted (or it might just wind up being fast by chance).

One could also take an evolved solution, analyze it, and then optimize it or rewrite it using the insights you got from your analysis. That could be even faster.


Would some form of computer-aided tool be able to help the programmer analyze GPs? It sounds like there's some repetitive process when you say 'GPs with a lot of patience'. I was thinking along the line that if such a tool is possible, then it might be possible to let the development and evolution of GPs iterate faster than deep learning. I found this thread really interesting, because of the small size, such implementations can be easily done locally without expensive and huge hardware setup. I'm definitely amazed at what DL has achieved so far. But it looks to me like a brute force way to solve a problem by gathering huge amounts of data, crunching on huge amounts of hardware, to solve a very specific classification problem. Not saying that is not good, just saying it is hard for mass participation at a high production quality level due to lack of hardware and stuffs.


Those are really old results. They should compare to this one: https://arxiv.org/pdf/1511.06581.pdf


How can the results be old when the paper is from 2017?


They are comparing their genetic programming results with a deep learning paper published in 2015. [1]

[1] https://www.nature.com/nature/journal/v518/n7540/abs/nature1...


What a time to live in, when papers from 2015 are "really old" in 2017.


1 deep learning year is about 49 dog years.


It highly depends on the field you're in though.


What is "exponential growth"!


It's more logarithmic, isn't it? Right now we're at the beginning stage, where there's massive discoveries and changes happening, but in say, 50 years time, there won't be much changing year-to-year.


Logarithmic or logistic?


this is old too


The convenient thing about Atari games is that there is usually a numerical score that can be used as input for the fitness function.


This is super cool, but it doesn't outperform deep learning based RL methods.

In fact, I'm not sure how much more compute efficient than something like A3C it would be. That can produce 4x the score of DQN in a comparable number of hours (and on a CPU).


A3C is only ever run on one game at a time[0]. This paper gets good performance on all games with the same agent

[0] read as: I have only seen papers with 1 agent per game for A3C


So it can train on one game and play without training on a previously unseen (but also atari) game? That's pretty neat, DQN and A3C certainly can't do that.


No, in this case it is trained on all the games, but retains good scores on all of them. If you train basic AC3 in all the games, you'll get poor performance on all the games due to catastrophic forgetting


Genetic Programming seems lightweight, what are some cool applications they have?


There are actually lots of very exciting GP applications! One of my favorites is "Fixing 55 out of 105 bugs for $8 each", in which GP is used to automatically repair code: https://www.cs.virginia.edu/~weimer/p/weimer-icse2012-genpro... . They've also achieved better-than-human level results in antenna design for NASA ( https://ti.arc.nasa.gov/m/pub-archive/1244h/1244%20(Hornby).... ) and in discovering novel quantum computing algorithms ( http://faculty.hampshire.edu/lspector/pubs/GP-quantum-GP98-w... ).

I'll also shamelessly hock here my GP framework for Python, in case you're interested in experimenting: https://github.com/hchasestevens/monkeys


A couple of years ago I read about some printer manufacturer evolving the shapes of their nozzles. Apparently the problem was nontrivial and rather than doing complicated analyses up front they found it more efficient to simply generate and fabricate random permutations of shapes and evolve them over many generations. The results were better than any human designed ones, apparently.


It's just a toy, but it's pretty entertaining to watch:

http://boxcar2d.com/

Unfortunately, it requires Flash.


What's a good starting point for someone interested in building game AI?


I'm far from knowledgable in the field, but I did a survey some time ago and think these items should provide a decent basis:

Dijkstra and A* Pathfinding, Finite State Machines, Decision Trees, Hierarchical Task Networks (SHOP, etc)

Keep in mind that game ai algorithms are all about decision taking, there's little "intelligence" involved, unlike the broader aim of "general" ai.


Current game ai is vastly different than this I think. I think there is a good writeup on the ai from FEAR that might be a decent read



gameaibook.org


This sounds interesting. I will like someone from the field of genetic programming on how this works and how it differs from current DL approaches.


Kelly's approach involves evolving teams of programs.

His basic strategy is to have a scalable problem decomposition strategy.

So programs that process pixels and the teaming of those programs are grouped together. The groupings (teams) themselves are co-evolved with the programs, simultaneously.

This enables niching and specialization behavior.

This builds on earlier work on 'symbiotic bid-based genetic programming' from other people at Dalhousie, the same university Kelly is at.

The innovation of this paper is that teams can reference other teams.

This allows for the creation of hierarchical teams. (There are rules to prevent cycles and other edge cases.)

Everyone commenting here is probably going to just look at numerical game score and ignore the fact that the runtime performance of Kelly's tangled program graphs. They are 1000 times smaller than a deep neural network. That matters for things like running on mobile/embedded devices.


> That matters for things like running on mobile/embedded devices.

Ding ding ding. This is where the money is at, good yet cheap sensors that sense human level actions are needed for IoT to be impactful.


This sounds like a divide & conquer approach (Sorry if this generalization is too lame). If it can work on less capable device than it will create a new wave of innovations in mobile devices.

I am wondering, whether a similar approach is possible with current DL models and will it have any performance improvements over what is existing or whether it will be computationally even more expensive.


In my rough understanding, DNN is genetic programming i a sense, matrixes over a vectorfield can be thought of as operators, and neurons are layers of xor circuits ...


I like GP, but the problem is AST. These can get huge. But the only advantage is ease of reverse engineering


Slightly relevant, here's a state of the art drone AI built using genetic fuzzy systems: https://www.forbes.com/sites/jvchamary/2016/06/28/ai-drone/#...


[flagged]


I don't think genetic algorithms are allowed here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: