Multi-Task Learning in Atari Video Games with Emergent Tangled Program Graphs

bomdo · on July 18, 2017

I was a little surprised at the headline, since I expected 'outperforms' to mean that it had better end-results, which is of course not the case. GP is just much faster due to it's relative simplicity and the results are close enough to those achieved with NN and deep learning.

> Finally, while generally matching the skill level of controllers from neuro-evolution/deep learning, the genetic programming solutions evolved here are several orders of magnitude simpler, resulting in real-time operation at a fraction of the cost.

> Moreover, TPG solutions are particularly elegant, thus supporting real-time operation without specialized hardware

This is the key takeaway and yet another reminder to not make deep learning the hammer for all your fuzzy problems.

symmetricsaurus · on July 18, 2017

> I was a little surprised at the headline, since I expected 'outperforms' to mean that it had better end-results, which is of course not the case. GP is just much faster due to it's relative simplicity and the results are close enough to those achieved with NN and deep learning.

From figure 3 in the paper it seems like it outperforms DQN on all games but one. So, it has better end results as well.

Edit: There are other results linked in this thread that are better than the 2015 DQN results that the paper refers to.

smdz · on July 18, 2017

One of the huge benefits of GPs over NNs is the ease of reverse engineering a GP tree compared to NN models. Its not effortless however. Its just not mathematically complex like NNs i.e. a programmer who isn't a mathematician can analyze GPs with a lot of patience

EDIT: I have found GPs to be relatively slow-to-very-slow. But very likely that is because of the lack of interest and development compared to NNs

aerique · on July 18, 2017

Partly, I think they're also fundamentally slower than NN because you're manipulating and executing programs (ASTs[1]) while for NNs you just adjust some values.

I've dabbled in GP and I really like it but those ASTs can get huge if they're not carefully pruned and might not add to the solution at all.

[1] https://en.wikipedia.org/wiki/Abstract_syntax_tree

vengarioth · on July 18, 2017

I don't think you can generalize it this way, ASTs could be compiled to fast machine code, also it really depends on the solutions the algorithms come up with. The NN is bound to its number of parameters, while the Genetic's program varies in length and can become quite small if length is part of the fitness function.

raverbashing · on July 18, 2017

Well, the accrual of "useless code" (there's a name for this that I forgot") is a known problem, but it is also something that stabilizes the learning process

I don't think it's as simple as putting the length of the AST in the goal function (but it's something interesting to try).

Depending on compile speed vs running speed you might be better off interpreting your ASTs

desku · on July 18, 2017

It's bloat due to 'introns' (useless statements that don't effect the output, like x = x * 1). And yes, just adding a fitness function to shorten program length isn't optimal. I've found it easier to evolve successful programs (letting the bloat happen) and then keep removing statements from correctly generated programs whilst checking if the output is the same. Probably not optimal either but I feel like it gives better results.

adrianratnapala · on July 18, 2017

I guess a lot of standard compiler optimisation techniques could be used here -- if you care.

I don't know what that would do to the learning process -- but at least it would be useful for end results.

sdenton4 · on July 18, 2017

That's kinda horrifying... Accruing useless code to reduce the probability of mutating the useful code? Sounds like a better regularization strategy is needed.

I'm generally pretty suspicious of generic algorithms; why take a random walk when you can March along the gradient towards a solution?

It might be interesting to try using GA for neural architecture, though, and gradient descent to train the network... (Though it sounds expensive.)

raverbashing · on July 18, 2017

> why take a random walk when you can March along the gradient towards a solution?

Because your problem has no smooth/continuous gradient

Because your problem has a giant search space

Because your problem can do with a "close enough" solution

Try gradient descending a symbolic regression and we'll talk

pmoriarty · on July 18, 2017

"I have found GPs to be relatively slow-to-very-slow."

There are two different speeds one could measure in regards to GP's.

The first is the speed at which one evolves solutions. This can, in fact, be frightfully slow and eat up all the hardware you can throw at it (depending on how large your populations are, how fast your fitness function is, etc).

The second is the speed of the evolved solution. This may be slow, but doesn't have to be. In fact, the speed of the solution could be part of what's being evolved. So you could explicitly evolve something fast, if you wanted (or it might just wind up being fast by chance).

One could also take an evolved solution, analyze it, and then optimize it or rewrite it using the insights you got from your analysis. That could be even faster.

kensoh · on July 18, 2017

Would some form of computer-aided tool be able to help the programmer analyze GPs? It sounds like there's some repetitive process when you say 'GPs with a lot of patience'. I was thinking along the line that if such a tool is possible, then it might be possible to let the development and evolution of GPs iterate faster than deep learning. I found this thread really interesting, because of the small size, such implementations can be easily done locally without expensive and huge hardware setup. I'm definitely amazed at what DL has achieved so far. But it looks to me like a brute force way to solve a problem by gathering huge amounts of data, crunching on huge amounts of hardware, to solve a very specific classification problem. Not saying that is not good, just saying it is hard for mass participation at a high production quality level due to lack of hardware and stuffs.

nivwusquorum · on July 18, 2017

Those are really old results. They should compare to this one: https://arxiv.org/pdf/1511.06581.pdf

csl · on July 18, 2017

How can the results be old when the paper is from 2017?

aqsalose · on July 18, 2017

They are comparing their genetic programming results with a deep learning paper published in 2015. [1]

[1] https://www.nature.com/nature/journal/v518/n7540/abs/nature1...

phoe-krk · on July 18, 2017

What a time to live in, when papers from 2015 are "really old" in 2017.

baq · on July 18, 2017

1 deep learning year is about 49 dog years.

amelius · on July 18, 2017

It highly depends on the field you're in though.

posterboy · on July 18, 2017

What is "exponential growth"!

sk0g · on July 18, 2017

It's more logarithmic, isn't it? Right now we're at the beginning stage, where there's massive discoveries and changes happening, but in say, 50 years time, there won't be much changing year-to-year.

jcranberry · on July 18, 2017

Logarithmic or logistic?

higgsfield · on July 18, 2017

this is old too

partycoder · on July 18, 2017

The convenient thing about Atari games is that there is usually a numerical score that can be used as input for the fitness function.

cshenton · on July 18, 2017

This is super cool, but it doesn't outperform deep learning based RL methods.

In fact, I'm not sure how much more compute efficient than something like A3C it would be. That can produce 4x the score of DQN in a comparable number of hours (and on a CPU).

habitue · on July 18, 2017

A3C is only ever run on one game at a time[0]. This paper gets good performance on all games with the same agent

[0] read as: I have only seen papers with 1 agent per game for A3C

cshenton · on July 18, 2017

So it can train on one game and play without training on a previously unseen (but also atari) game? That's pretty neat, DQN and A3C certainly can't do that.

habitue · on July 18, 2017

No, in this case it is trained on all the games, but retains good scores on all of them. If you train basic AC3 in all the games, you'll get poor performance on all the games due to catastrophic forgetting

gourou · on July 18, 2017

Genetic Programming seems lightweight, what are some cool applications they have?

hchasestevens · on July 18, 2017

There are actually lots of very exciting GP applications! One of my favorites is "Fixing 55 out of 105 bugs for $8 each", in which GP is used to automatically repair code: https://www.cs.virginia.edu/~weimer/p/weimer-icse2012-genpro... . They've also achieved better-than-human level results in antenna design for NASA ( https://ti.arc.nasa.gov/m/pub-archive/1244h/1244%20(Hornby).... ) and in discovering novel quantum computing algorithms ( http://faculty.hampshire.edu/lspector/pubs/GP-quantum-GP98-w... ).

I'll also shamelessly hock here my GP framework for Python, in case you're interested in experimenting: https://github.com/hchasestevens/monkeys

taw55 · on July 18, 2017

A couple of years ago I read about some printer manufacturer evolving the shapes of their nozzles. Apparently the problem was nontrivial and rather than doing complicated analyses up front they found it more efficient to simply generate and fabricate random permutations of shapes and evolve them over many generations. The results were better than any human designed ones, apparently.

criddell · on July 18, 2017

It's just a toy, but it's pretty entertaining to watch:

http://boxcar2d.com/

Unfortunately, it requires Flash.

gourou · on July 18, 2017

What's a good starting point for someone interested in building game AI?

taw55 · on July 18, 2017

I'm far from knowledgable in the field, but I did a survey some time ago and think these items should provide a decent basis:

Dijkstra and A* Pathfinding, Finite State Machines, Decision Trees, Hierarchical Task Networks (SHOP, etc)

Keep in mind that game ai algorithms are all about decision taking, there's little "intelligence" involved, unlike the broader aim of "general" ai.

tdb7893 · on July 18, 2017

Current game ai is vastly different than this I think. I think there is a good writeup on the ai from FEAR that might be a decent read

uoaei · on July 19, 2017

http://alumni.media.mit.edu/~jorkin/goap.html

desku · on July 19, 2017

gameaibook.org

nocoder · on July 18, 2017

This sounds interesting. I will like someone from the field of genetic programming on how this works and how it differs from current DL approaches.

henning · on July 18, 2017

Kelly's approach involves evolving teams of programs.

His basic strategy is to have a scalable problem decomposition strategy.

So programs that process pixels and the teaming of those programs are grouped together. The groupings (teams) themselves are co-evolved with the programs, simultaneously.

This enables niching and specialization behavior.

This builds on earlier work on 'symbiotic bid-based genetic programming' from other people at Dalhousie, the same university Kelly is at.

The innovation of this paper is that teams can reference other teams.

This allows for the creation of hierarchical teams. (There are rules to prevent cycles and other edge cases.)

Everyone commenting here is probably going to just look at numerical game score and ignore the fact that the runtime performance of Kelly's tangled program graphs. They are 1000 times smaller than a deep neural network. That matters for things like running on mobile/embedded devices.

Normal_gaussian · on July 18, 2017

> That matters for things like running on mobile/embedded devices.

Ding ding ding. This is where the money is at, good yet cheap sensors that sense human level actions are needed for IoT to be impactful.

nocoder · on July 18, 2017

This sounds like a divide & conquer approach (Sorry if this generalization is too lame). If it can work on less capable device than it will create a new wave of innovations in mobile devices.

I am wondering, whether a similar approach is possible with current DL models and will it have any performance improvements over what is existing or whether it will be computationally even more expensive.

posterboy · on July 19, 2017

In my rough understanding, DNN is genetic programming i a sense, matrixes over a vectorfield can be thought of as operators, and neurons are layers of xor circuits ...

jerianasmith · on July 18, 2017

I like GP, but the problem is AST. These can get huge. But the only advantage is ease of reverse engineering

99mistakes · on July 18, 2017

Slightly relevant, here's a state of the art drone AI built using genetic fuzzy systems: https://www.forbes.com/sites/jvchamary/2016/06/28/ai-drone/#...

vivek1410 · on July 18, 2017

[flagged]

kyberias · on July 18, 2017

I don't think genetic algorithms are allowed here.