I was a little surprised at the headline, since I expected 'outperforms' to mean that it had better end-results, which is of course not the case. GP is just much faster due to it's relative simplicity and the results are close enough to those achieved with NN and deep learning.
> Finally, while generally matching the skill level of
controllers from neuro-evolution/deep learning, the genetic programming solutions evolved here are several orders of magnitude simpler, resulting in real-time operation at a fraction of the cost.
> Moreover, TPG solutions are particularly elegant,
thus supporting real-time operation without specialized hardware
This is the key takeaway and yet another reminder to not make deep learning the hammer for all your fuzzy problems.
> I was a little surprised at the headline, since I expected 'outperforms' to mean that it had better end-results, which is of course not the case. GP is just much faster due to it's relative simplicity and the results are close enough to those achieved with NN and deep learning.
From figure 3 in the paper it seems like it outperforms DQN on all games but one. So, it has better end results as well.
Edit: There are other results linked in this thread that are better than the 2015 DQN results that the paper refers to.
One of the huge benefits of GPs over NNs is the ease of reverse engineering a GP tree compared to NN models. Its not effortless however. Its just not mathematically complex like NNs i.e. a programmer who isn't a mathematician can analyze GPs with a lot of patience
EDIT: I have found GPs to be relatively slow-to-very-slow. But very likely that is because of the lack of interest and development compared to NNs
Partly, I think they're also fundamentally slower than NN because you're manipulating and executing programs (ASTs[1]) while for NNs you just adjust some values.
I've dabbled in GP and I really like it but those ASTs can get huge if they're not carefully pruned and might not add to the solution at all.
I don't think you can generalize it this way, ASTs could be compiled to fast machine code, also it really depends on the solutions the algorithms come up with. The NN is bound to its number of parameters, while the Genetic's program varies in length and can become quite small if length is part of the fitness function.
Well, the accrual of "useless code" (there's a name for this that I forgot") is a known problem, but it is also something that stabilizes the learning process
I don't think it's as simple as putting the length of the AST in the goal function (but it's something interesting to try).
Depending on compile speed vs running speed you might be better off interpreting your ASTs
It's bloat due to 'introns' (useless statements that don't effect the output, like x = x * 1). And yes, just adding a fitness function to shorten program length isn't optimal. I've found it easier to evolve successful programs (letting the bloat happen) and then keep removing statements from correctly generated programs whilst checking if the output is the same. Probably not optimal either but I feel like it gives better results.
That's kinda horrifying... Accruing useless code to reduce the probability of mutating the useful code? Sounds like a better regularization strategy is needed.
I'm generally pretty suspicious of generic algorithms; why take a random walk when you can March along the gradient towards a solution?
It might be interesting to try using GA for neural architecture, though, and gradient descent to train the network... (Though it sounds expensive.)
"I have found GPs to be relatively slow-to-very-slow."
There are two different speeds one could measure in regards to GP's.
The first is the speed at which one evolves solutions. This can, in fact, be frightfully slow and eat up all the hardware you can throw at it (depending on how large your populations are, how fast your fitness function is, etc).
The second is the speed of the evolved solution. This may be slow, but doesn't have to be. In fact, the speed of the solution could be part of what's being evolved. So you could explicitly evolve something fast, if you wanted (or it might just wind up being fast by chance).
One could also take an evolved solution, analyze it, and then optimize it or rewrite it using the insights you got from your analysis. That could be even faster.
Would some form of computer-aided tool be able to help the programmer analyze GPs? It sounds like there's some repetitive process when you say 'GPs with a lot of patience'. I was thinking along the line that if such a tool is possible, then it might be possible to let the development and evolution of GPs iterate faster than deep learning. I found this thread really interesting, because of the small size, such implementations can be easily done locally without expensive and huge hardware setup. I'm definitely amazed at what DL has achieved so far. But it looks to me like a brute force way to solve a problem by gathering huge amounts of data, crunching on huge amounts of hardware, to solve a very specific classification problem. Not saying that is not good, just saying it is hard for mass participation at a high production quality level due to lack of hardware and stuffs.
It's more logarithmic, isn't it? Right now we're at the beginning stage, where there's massive discoveries and changes happening, but in say, 50 years time, there won't be much changing year-to-year.
This is super cool, but it doesn't outperform deep learning based RL methods.
In fact, I'm not sure how much more compute efficient than something like A3C it would be. That can produce 4x the score of DQN in a comparable number of hours (and on a CPU).
So it can train on one game and play without training on a previously unseen (but also atari) game? That's pretty neat, DQN and A3C certainly can't do that.
No, in this case it is trained on all the games, but retains good scores on all of them. If you train basic AC3 in all the games, you'll get poor performance on all the games due to catastrophic forgetting
A couple of years ago I read about some printer manufacturer evolving the shapes of their nozzles. Apparently the problem was nontrivial and rather than doing complicated analyses up front they found it more efficient to simply generate and fabricate random permutations of shapes and evolve them over many generations. The results were better than any human designed ones, apparently.
Kelly's approach involves evolving teams of programs.
His basic strategy is to have a scalable problem decomposition strategy.
So programs that process pixels and the teaming of those programs are grouped together. The groupings (teams) themselves are co-evolved with the programs, simultaneously.
This enables niching and specialization behavior.
This builds on earlier work on 'symbiotic bid-based genetic programming' from other people at Dalhousie, the same university Kelly is at.
The innovation of this paper is that teams can reference other teams.
This allows for the creation of hierarchical teams. (There are rules to prevent cycles and other edge cases.)
Everyone commenting here is probably going to just look at numerical game score and ignore the fact that the runtime performance of Kelly's tangled program graphs. They are 1000 times smaller than a deep neural network. That matters for things like running on mobile/embedded devices.
This sounds like a divide & conquer approach (Sorry if this generalization is too lame). If it can work on less capable device than it will create a new wave of innovations in mobile devices.
I am wondering, whether a similar approach is possible with current DL models and will it have any performance improvements over what is existing or whether it will be computationally even more expensive.
In my rough understanding, DNN is genetic programming i a sense, matrixes over a vectorfield can be thought of as operators, and neurons are layers of xor circuits ...
> Finally, while generally matching the skill level of controllers from neuro-evolution/deep learning, the genetic programming solutions evolved here are several orders of magnitude simpler, resulting in real-time operation at a fraction of the cost.
> Moreover, TPG solutions are particularly elegant, thus supporting real-time operation without specialized hardware
This is the key takeaway and yet another reminder to not make deep learning the hammer for all your fuzzy problems.