Hacker News new | past | comments | ask | show | jobs | submit login
The quest to evolve neural networks through evolutionary algorithms (oreilly.com)
213 points by hardmaru on Oct 30, 2017 | hide | past | favorite | 104 comments



I spent more than 10 years working on this problem. One major challenge is how to represent the genotype-phenotype map so that genetic variation leads to favorable phenotypic changes. For this you start going down a rabbit hole of developmental evolution and the evolution of evolvability and then a seemingly unrelated subject: the evolution of robustness.

For anyone that’s interested here’s an article I published in 2008–it has about 175 citations.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2538912/

I’d also recommend the work by Peter Eggenberger from the late 90s.


> evolution of evolvability and then a seemingly unrelated subject: the evolution of robustness.

Isn't there a an analogue to this in software engineering?

There is an extremely large space of programs that produce the exactly same output on a given input, but we humans favor programs that we can read and understand. We also favor programs that are modular and composable, because bits of those programs tend to get reused to create new programs and are easier to write than new 'blob of code' programs.

One can view programs as memetic 'self-replicators', with a selection process dominated by the preferences of human readers and writers.

If we analyzed many dependency graphs between code modules, or a graph of function call usage between snippets, I bet you'd find that actual human-written programs have the same 'sparse' connectivity that you see, because maintainability implies writing code along sane API abstraction boundaries.

TL;DR evolutionary pressure might means nature naturally abstractions within genetic code


The mechanism for evolution is often the opposite of abstraction. Living things tend to make a copy of some really important bit of 'code' and then mutate that 'copy' rather than say losing the ability to form cell walls.

On the other if a cell is making X then the new process can easily leverage the existence of that X when doing something new.

Net result copy paste coding + dependency hell.


I have always seen evolutionary algorithm as a version of gradient descent that throws away the gradient, and instead just goes randomly in all directions until it finds something that works (which is extremely wasteful).

I suppose this is wrong since people still find these techniques useful, but what are the advantages of these techniques compared to gradient descent? (other than the fact that you don't need your fitness function to be differentiable)


> I have always seen evolutionary algorithm as a version of gradient descent... which is extremely wasteful

Evolutionary algorithms work by sampling a combinatorial solution space, and learning about the interactions between solution components. This is made explicit (and much more elegant) in the definition of EDAs:

https://en.wikipedia.org/wiki/Estimation_of_distribution_alg...

By contrast, gradient descent makes the assumption that such interaction effects don't exist or don't matter, which doesn't work for most optimisation problems - you get stuck in local optima.


Thanks! That link was really helpful. I guess some optimisation methods using a Hessian take those interaction effects into consideration, but they are too expensive and not used in practice.

> doesn't work for most optimisation problems - you get stuck in local optima.

This bit is not necessarily true. In practice, and given the high dimensionality, it is (apparently) quite hard to get stuck in local optima. [0]

[0] https://www.quora.com/How-come-neural-networks-dont-get-stuc...


> This bit is not necessarily true. In practice, and given the high dimensionality, it is (apparently) quite hard to get stuck in local optima. [0]

In almost every optimisation problem I've encountered in practice, you do get stuck in local optima using simple techniques like SGD. Interaction effects are everywhere, and create local optima.


Evolutionary algorithms may also let you evolve the architecture. Think of a fly brain evolving into a human brain using the same genetic toolbox. With learning algorithms your assuming a fixed architecture. Hard to think how we’d get open ended evolutionary AI without evolutionary algorithms. At that point learning algorithms is the context dependent optimization step after neurogenesis.


The terms you are using are too vague, and seem disconnected from the reality of these algorithms.

How do evolutionary algorithms let you evolve the architecture? You define a model space, and explore it stochastically (with different evolutionary strategies). The model space you are exploring is linked to a specific architecture.

Regardless, there's also efforts in Deep Learning to automatically learn optimal network architectures ("AutoML").


Of course you can mix your gradient descent with simulated annealing. But that gets close to evolutionary algorithms.


The fact that you don't need your fitness function to be differentiable is a big advantage :) if you had the gradient it would always help to use it.

In reinforcement learning: evolutionary algorithms work by applying peturbations in the parameters space, not in the action space. If your problem is sensitive to action-space perturbations (requires consistent strategies) it might not be possible or efficient to use "standard" RL. Evolutionary strategies (ES) are quite competitive in RL, especially paired with some dimensonality reduction technique (unsupervised VAE).

Local minima are not really an advantage. ES are nearly local - they sample the space around the current solution to approximate the gradients. They'll get stuck in local minima sometimes too (if there are any). Bayesian optimization is more interesting, as it's actually global.


"what are the advantages of these techniques compared to gradient descent?"

Well, the really big advantage is that simple gradient descent can get you stuck in local maxima (or minima, depending on how you look at things). Many fitness landscapes are guaranteed to produce suboptimal results if you use gradient descent. By contrast, a sufficiently large random variation will eventually "get out of the local hole"/"get past the local hump".

It doesn't necessarily have anything to do with differentiability; consider the graph of 0.5x+sin(x). A gradient descent technique is going to get stuck on that one right away, while random exploration won't.

Edit to address some stuff that dragontamer brought up below:

A good genetic algorithm is mostly going to produce offspring that are close to the parents (recombination, which de facto is similar to hillclimbing/gradient descent/gradient ascent) but occasionally it's going to try something off that's completely off the chain (mutation). Some of the early work with genetic algorithms focused on mutation, but it soon turned out that recombination was better most of the time. Nonetheless, you still need some mutation, or you run the risk of getting stuck at local extrema.


I think the premise (neural nets get stuck in local optima) is not trivial, and there has been a lot of research about non-convex optimisation showing that this is not much of an issue. I am not a researcher, but this answer [0] points to this research.

I would also say that there are multiple ways to escape local optima (setting a larger learning rate, multiple random initialisations, ensembling).

https://www.quora.com/How-come-neural-networks-dont-get-stuc...


You say "not much of an issue" but your link says that it's an "open question that is probably being worked on in the community".

Those aren't the same thing at all.


Minima/maxima in high-dimensional spaces are not much of an issue; the bigger issue are saddle points.

cf https://arxiv.org/pdf/1406.2572.pdf

This is presuming that the problem has a continuous formulation anyways.


Saddle points are not much of an issue too: https://arxiv.org/abs/1710.07406


To add to what others have said, local extrema are actually quite rare for "random" high-dimensional functions. This is suggested by a simple heuristic based on the Hessian matrix. As the dimension increases, it becomes exponentially harder for every eigenvalue to have the same sign (coin flipping experiment). So it stands to reason that saddle points should be far more common, and SGD is actually quite good at overcoming saddle points.


This does not accord with my reading of history. GAs (heavily crossover-based evolution) were failing pretty hard at optimization until the early '90s, when they adopted some good ideas (real coding, better mutation operators) from the (largely mutation-oriented) ES community. If you look at the most effective single-objective evolutionary algorithms today, they're mostly descendents of ES, like DE and CMAES. Honorable mention to the micro-GA as a descendent of the GA lineage, but its effectiveness is largely driven by discontinuous mutation events. The Goldberg-style GA is woefully out of date and not particularly useful.


One thing I don't see mentioned here is flexibility. You can plug any ugly old thing into an evolutionary algorithm and let it run. Give me a working computational model at breakfast time and I'll have runs going by lunch. Cranky ancient FORTRAN that requires you to write a new input file for every evaluation? No problem. You have to compile the inputs into the model to make it run? Fine. Badly scaled inputs or outputs? EA doesn't care. More than one objective? Great! Population-based search is a natural fit for multiple objective optimization.

As long as you're clear on what the decisions and objectives are, and you're able to run the model yourself, layering evolutionary optimization on top is easy.


I think the multiobjective-part is very important. In other algorithms, you often have to specify the solution-space before hand. For instance, when optimizing between lightweight and strength, one would have to beforehand say how to weight those two properties. (f = 100s - 5w for instance). This throws away a whole dimension in your search space. For EAs, you can let it roam free, and select the bests tradeoffs from the pareto-set after the algorithm is done.


Yes, exactly! And it gets even better. With a good MOEA, you can keep track of your evaluations, do a short run, decide your objectives don't really express your understanding of the problem, and seed your old evaluations into a run with new objectives. This means you can change objectives without repeating computationally expensive model runs.


The IMO, most interesting work in EA is in 'search space illumination', MAPElites, and it's variations are a great example.


Thanks for pointing this out. I had a look at http://www.evolvingai.org/files/1504.04909v1.pdf. It's an interesting read, although I'm not sold on the value of feature space exploration. I think it muddies the waters around the problem of balancing optimization with diversity, by lumping decisions and constraints together. What's to stop it from over-sampling regions of the decision space that have a steep gradient in the other features?


What's to stop the discovered solutions from being similar? Nothing but the dimensions of interest. If the best solutions with respect to features a, b, and (a&b) are all the same, then it does not seek different solutions.

This may indicate you chose reduntant features though.

Here's an interesting application that provides some motivation. https://arxiv.org/abs/1407.3501


Regarding the similarity of solutions:

If, in a particular region, your model outputs are particularly sensitive to the inputs, that region can be overrepresented in a feature map. What I mean by this is that we cannot effectively choose between the solutions in this region -- the region is small compared to our ability to control inputs. Overrepresenting such a region in a population-based search starves the rest of the problem space of the algorithm's attention.

Thanks for the link, by the way -- this is a particularly interesting application.


I think you confused "evolutionary algorithm" with Simulated Annealing. Because Simulated Annealing is LITERALLY a random walk (biased towards higher values, but its strongly random nonetheless).

https://en.wikipedia.org/wiki/Simulated_annealing

Simulated Annealing works because it doesn't get stuck in local maximums. Do a simple gradient ascent on this simple 2d plot, and its very, very unlikely to achieve the true global value: https://upload.wikimedia.org/wikipedia/commons/d/d5/Hill_Cli...

But if you do Simulated Annealing (aka: random walk), you get to the global maximum much more reliably. Genetic Algorithms have a similar effect as Simulated Annealing because Genetic Algorithms are strongly randomized. But they still "hill climb" explicitly, because its rare for parents to be thrown away if they were superior to the children.


> Simulated Annealing is LITERALLY a random walk (biased towards higher values, but its strongly random nonetheless).

I don't think that's a fair characterisation, and it could be misleading for people new to SA. I'd say SA is a hill-climbing algorithm that has a probability of accepting an inferior move, a probability which shrinks asymptotically over time.


Random walks work well in high dimension soaces, where calculating the gradient gets to be very expensive. Depending on your metaparameters, random walk can also try to jump out of a local minimum, where gradient descent only optimizes within the initial local mimimum.


Absolutely no. Random search does not work in high dimensions because of the "curse of dimensionality" - the number of directions to search grows exponentially with dimension. Gradient descent avoids the problem because it knows the right direction.

Evolutionary strategies are doing (natural) gradient descent using an estimate of the gradient. Here's a good article: http://www.inference.vc/evolutionary-strategies-embarrassing...


Gradient descent can also jump out of a local minimum if the learning rate is large enough, so they're equal in that sense.

But it does make sense that a random walk would be more efficient in very high dimensional problems!


Turing proposed using genetic algorithms to evolve neural networks, and realizing the same issues you have, he then studied those underlying problems more in general (in particular, morphogenesis).


What are your thoughts about Grammatical Evolution? It cleverly uses BNF to constrain the possibilities to legitimate ones.


That was my first thought as well; GE is the easiest way I know of to introduce domain-specific knowledge into the genotype > phenotype translation.


I also thought of that problem and I created language for representing agents in these systems. https://github.com/romankierzkowski/langner Unfortunately, I didn't carry on that work, so I haven't evaluated that kind of representation yet. I would love to know your opinion on Langner.


You know the counter intuitive thing for myself that i found? Sexual reproduction sucks. It doesn't nearly have the performance gains I'd expect. In order to really own the gamut of a solution space, only asexual reproduction really has good consistent results. I've learned a lot more about why that is probably the case now, but i spent a long time getting to the bottom of it.


I think the depends on your genotype-phenotype map. It’s hypothesized that sexual reproduction helps drive modularity because evolution needs to solve the problem of how to bring together two genetically different individuals and still come out with a viable offspring. This is why robustness is so entwined with evolvability. If your genotype-phenotype map is simple then robustness doesn’t need to evolve but this may limit evolvability. The thing I always found with with evolution is that there is no free lunch.


Cliff-hanger! Can you elaborate on why asexual reproduction might perform better?


My theory is that sexual reproduction focuses genes on performance in a stable environment, and asexual reproduction grows to encompass the entire gamut of every survivable environment.

When you don't know the traits that are beneficial in an environment, a significant change adversely affects all sexually reproduced algorithms to a significant degree, whereas it affects all asexually produced algorithms at varying degrees.


>Sexual reproduction sucks.

There is a joke to be made here.


Plants just wanna have fun?


> evolution of evolvability and then a seemingly unrelated subject: the evolution of robustness.

Isn't evolvability only about robustness? What other criteria would improve evolution? Mutations just happen, so the question is how well the phenotype can deal with these mutations. If it can incorporate mutations well, it can tunnel to different useful phenotypes and therefore is robust.


Agree, but it’s not not obvious to most people. Robustness itself if a murky subject: are you robust to mutation, to the environment, genetic noise, genetic background? Also if you are too robust then you can’t evolve. What we want to say with evolvability is that we’re more predisposed positive selection.


Do you think creating a 3D universe (like that being worked on by open.ai), putting the evolvable agents in it and trying to make them evolve learning and intelligence is a good idea?

I mean starting at a very primitive level, you could first aim for something like ant-level intelligence, then go to higher and higher intelligences from there.


I think these type of discussions are really detached from the reality of what neural networks, AI or evolutionary algorithms are.

These are mathematical models that optimise a specific fitness function, they borrow some biological concepts but they are not going to "evolve into ant-level intelligence".


> they are not going to "evolve into ant-level intelligence"

How do we know until we've tried it?

I mean, at a certain level of abstraction, you can just view humans as a bunch of atoms that have been placed in a heat bath for a very long time. And yet intelligence seems to have spontaneously self-organized from it.


Again, this is very detached from what ANNs or genetic algorithms are: relatively simple methods to optimise fitness (or "cost") functions.

They are nowhere close to the complexity of a brain, and certainly do not have enough capacity to be a realistic simulation of the universe.

> How do we know until we've tried it? Tried what, is the question? What's an ant-level intelligence? We have algorithms that can detect images/speech at super-human level (at least in some conditions). Worthy questions, but unrelated to these algorithms.


I'm not sure that ants are doing a "realistic simulation of the universe", though, or anything like it. From what I understand, they're a lot closer to little robots than they are to something that can do Kip Thorne-level astrophysics, or Thomas Aquinas-level philosophy.

I am thinking, in particular, of an experiment done by Richard Feynman where he got ants walking across small sheets of glass. After shuffling the sheets around so the chemical trail became a loop, he was able to get the ants tirelessly marching around in a circle (which they would presumably do until they dropped).

Heck, you probably don't even need an ANN at all to do that, much less an evolutionary one.


You could view it that way, but it won't do you much good, since biology is a long ways from physics, and humans didn't spontaneously emerge from a long standing heat bath.


"humans didn't spontaneously emerge from a long standing heat bath."

They didn't? I'm not sure what you're saying here. Are you arguing intelligent design or something? Because the standard evolutionary theories postulate that humans did, in fact, spontaneously arise from a heat bath (the heat coming from the Sun).


If by "spontaneous", you mean a process taking several billion years which included heat from the sun, but also resources on Earth, with billions of previous ancestral forms from simple replicators to complex multicellular organisms, then okay.

That's not what I typically think of when I hear the word "spontaneous". Spontaneous emergence sounds to me like lightning hitting a swamp over a period of time, and then by a pure chance arrangement of molecules, a fully formed swamp man comes out of it.

That's way different than an evolutionary process.


But that's what the discussion is about here:

> Do you think creating a 3D universe (like that being worked on by open.ai), putting the evolvable agents in it and trying to make them evolve learning and intelligence is a good idea

Basically simulating a universe, seeded with some things we think are already somewhere along the road to intelligence, then seeing if progress is made just by simulating for a long time.


Why am I getting downvoted for this? Evolution isn't spontaneous generation. Evolution disproved any notions of spontaneous generation a century or more ago. All life on Earth evolved via a causal process, not random arrangement of atoms.


Given a fitness function that is the same as an ant's, it seems plausible that they would.

The question is, what is that function?


> Given a fitness function that is the same as an ant's, it seems plausible that they would. > The question is, what is that function?

Not necessarily. Everything points to the fact that the humain brain (at least, I don't know much about ants) does not work in any way similar to neural networks, and as such there's no guarantee that you can represent its behaviour by the current algorithms.

For example, real neurons have no supervision signal. Memory is also a big issue (see the work being done by DeepMind and FAIR on differentiable neural computers, and memory networks) and Reinforcement Learning still struggles with long-term planing, switching strategies, etc.


> Not necessarily. Everything points to the fact that the human brain (at least, I don't know much about ants) does not work in any way similar to neural networks, and as such there's no guarantee that you can represent its behaviour by the current algorithms.

Yes, the brain is most likely not simply modelling one giant ANN, but it's much more plausible that ANN-like components have a role in it.


If there is a space of optimal policies for the ant, then both the ANN and real neural systems will be attracted there. There are certainly a lot of local minima, but it would be surprising to me if ANNs and real neural systems could only find totally disjoint local minima with no similarities.


>and as such there's no guarantee that you can represent its behaviour by the current algorithms.

Doesn't the universal function approximation theorem provide just such a guarantee? It doesn't guarantee any algorithm will converge on that representation, but the capability to represent it is there.


It’s almost certainly not just one function.

An ant’s fitness function is different from its ancestor’s fitness function, which is different from its own ancestor’s fitness function, which is different from...

You don’t get from zero to complexity with a single fitness function.

The question we should ask is: How do we vary the fitness function over time in order to evolve something complex?

I have no idea what the answer is, but...

“If you know the question, you know half.”

-Herb Boyer, geneticist


Balanced Homeostasis? Maybe we could just start with that - it might give an organism life, but it would be interesting to see if social emergent behaviors came from a group of agents optimizing homeostasis alone



Open ended evolution is an unsolved problem. You need the right genotype-phenotype map. That’s a hard problem and even if I was lucky and looking in the right direction probably way beyond my IQ.


I assume this is the Peter Eggenberger you are referring to?

https://www.semanticscholar.org/author/Peter-Eggenberger-Hot...


Looks like the same guy.


While perhaps not immediately pertinent to the problem, the emergence of sexual reproduction is credited as a major factor in the explosion of variation in multicellular organisms. Part of the advantage, ironically, is that sexual reproduction is a rather large burden compared to evolutionary strategies that came before. The fact that sexual reproduction is so challenging perhaps ensures that less is left to chance by filtering out individuals that cannot meet the burden and rewarding those that tend to be better at achieving reproduction itself. Likely there is some relation between improved reproductive success and some novel traits that helped to achieve it. I don't have the knowledge to weigh in on AI analogs, but I could imagine roughly a strategy that involves co-related burdens and goals to improve the chances of choosing the right individuals.


I think, the main effect of sexual reproduction is that, much like GANs and competitive self-play, it creates species-internal competition: Both sexes need to impress, which makes cheating an obvious strategy (makeup, steroids, Shakespeare quotes, LISP etc., but many such examples can be found in the animal world), and hence both sexes also need to be able to detect cheating. Some species are rather asymmetric in that regard. For example, in humans it is mainly women who attract (they masquerade as fruits [make up is likely a cross-cultural phenomenon; and, well, breasts] tapping into male food gathering circuitry); men compete in hierarchies trying to impress and women select men from the top of the hierarchy. Complex dynamics emerging from this likely lead to the immense growth of the human cortex.

Sexual reproduction basically outsources some of the selection effort to the cognitive apparatus of the species itself, thereby introducing a massive amount of additional selection signals (mainly by the much increased necessity to model other minds, namely minds of the opposite sex). Many of these signals promote traits that are useful for survival (mainly intelligence and health).


I was thinking the same thing. I think it would be interesting to explore this area more and try to model it computationally.

Your point about physical features made me think of how physical attractiveness plays into human development as well.

Research has shown that beauty in humans is defined as physical symmetry. So "novelty" in our case might be defined as someone who is really ugly - the elephant man.

So in this case, beauty wouldn't really fit into the robot walking example as it's neither fitness (moving the foot forward) or novelty. More a different type of fitness that increases the odds of reproduction.


My guess would be that symmetry is a simple heuristic measure of physical fitness. Visual attraction is basically a strong regularizer that restricts the search space to phenotypes with particular traits. Asymmetry means that the joints wear out more quickly and muscles might not coordinate optimally leading to less strength and a reduced ability to hunt and to fight predators. AFAIK it is also a quite robust predictor of all kinds of diseases because it often means that the growth signalling is out of tune throughout the system. Visual selection basically performs environmental selection more immediately and more effectively: an asymmetric person might still survive, but its offspring has a lower overall chance to survive. The teaching signals of that are much weaker.


Good points. Never thought of it that way.


For those new to the subject, here is a nice visual primer to Evolution Strategies.

http://blog.otoro.net/2017/10/29/visual-evolution-strategies...


Thanks for that. My undergrad was Industrial Engineering but I always loved the Operations Research stuff (which I'm hoping will lead me to more CS-y stuff if I can get a job or a Masters or something). All that OR with LPs and MIPs was a headache and but somehow Stochastic programming, GAs and NNs were scary and I never even tried to understand them. I always thought people describing them as simple were probably geniuses or something. Maybe one day I'll understand the second half of that link.


How do evolutionary hyper-parameter optimizations compare to Bayesian ones? My impression always was that Bayesian optimization is more targeted, and therefore more efficient and ultimately finds optimal parameters faster. However, evolutionary algorithms are easier to parallelize, so maybe EAs indeed have a place in a non-research-oriented, applied DL setting?


Covariance matrix adaptation is comparable on real-valued hyperparameters, but you're stuck outside of that. More typical genetic algorithms waste a TON of compute time, so they might end up finding a good solution, but the computational budget is out of reach of normal companies.


I came up with a similar idea recently--it figures that every original idea I ever have turns out to be not original and decades-old.

Perhaps the lack of success so far has been due to the fact that the faculties of the brain we attribute to higher-order complex thought are themselves based upon brain regions that are much older evolutionary-speaking. So it's not just one system you have to evolve--it's many hierarchical subsystems. And given the amount of time it took before human-level intelligence arose in the animal kingdom it's pretty clear that randomly evolving a human-level intelligence isn't something that's going to happen very often.

It also seems to me that the only reason intelligence evolved in the first place was because of the complex environment provided by our natural world. I have a feeling that we're not going to see much in the way of intelligent learning systems unless we can provide those systems with a sufficiently complex environment to learn and evolve within.


My personal hypothesis is that strong AI will evolve when Google decide to add live feedback to google search suggestions.

At the point that system can unprompted change what it shows the user, it'll be interactive enough that we will presumably just have to wait for adequate computational backing for it to "wake up".


I found this article really intriguing as someone new to this field.

If I'm understanding this right, it seems like most approaches up to this point are focused on evolving a single "unit" or brain / person.

You have the concept of "nature" that selects which units will advance to the next generation and passing on "DNA". First based on fitness and now the new approach is novelty.

This may seem a bit naive, but has anyone explored adding a social dimension?

For the robot walking example, you have nature choosing novelty and those that made progress.

A basic social dimension could have units observing other units and sharing information.

But then there's a variety of other dimensions - a unit blocking another unit from walking (cheating), a bigger unit destroying a smaller one, maybe success via misc factors like physical symmetry / popularity.

Idk. Just curious to learn more and where the research is at.


The is mostly in the real of Artificil Life. Check out Chris Adami’s work from the late 90s as a starting point.


Awesome, thanks! I've been reading up a bit and this is exactly what I was hoping to find.


I believe competition could be a big force towards progress. GANs are based on this topic and are very successful.


Yeah, I figured that. I imagine it's a bit challenging to model though.

You'd have to be running every instance simultaneously and have them all observing and reacting to each other.

Also gets into some interesting "morality" type of questions. As in if an instance cheats to get ahead, are there consequences?


One thing I find interesting to note is that modern day policy gradient deep reinforcement learning methods are incredibly similar to evolutionary algorithms and DRL methods have been applied quite successfully to neural architecture search.


I wonder why evolutionary algorithms never managed to be that successful.


They're quite successful on complex multiobjective problems, but you need a lot of compute power to cover the solution space sufficiently. Most problems either 1) don't have such stringent requirements on multiobjective optimization (i.e. the problem can be recast with a single objective function, likely one with a gradient), or 2) more typical machine learning algorithms are "good enough" and require less compute power. But for instance designing complex geometries under numerous constraints is well-suited to evolutionary algorithms. EAs also shine when the evaluation for fitness is computationally-intensive, as you can defer the fitness evaluations until absolutely necessary.


My understanding is MOEAs are not a whole lot better than random search.


Nice article, lots of good tidbits and insights. The thoughts on indirect encoding seem like an especially interesting area of research. I actually just finished up a project that used a genetic algorithm to tune some of the hyperparameters of an RNN [1]. I didn't evolve the architecture except which cell to use (LSTM or GRU), but it wouldn't be much of an extension to the proof of concept I coded. Thanks for sharing.

[1] http://cole-maclean.github.io/blog/Evolving%20the%20StarCraf...


I wonder how much damage(physical damage) comes into play with evolving robust systems and creating interesting variations in nature. Using the robotic gait example, in the real world organisms have a high enough chance of losing limbs that having the neural circuitry to survive in the absence of one or more limbs is advantageous. Maybe these simulations could increase their ability to produce robust networks by modeling accidental damage(from tiny to catastrophic) to random parts of networks and selecting those that survive in spite of such damage.


See the work of Josh Bongard and Hod Lipson, examining exactly this topic.

http://science.sciencemag.org/content/314/5802/1118


Do you see many wild animals with missing limbs?

Especially mammals


Like dropout or stochastic depth?


Author of SharpNEAT here if anyone has any questions.


What would be your recommendation to learn about these topics ? Moocs, books, having some pet project ?


I don't know of any online teaching material, I think this is because it is a relatively niche and under explored area of research - which is one of main reasons I continue to maintain and develop sharpneat.

What has been useful for me is just setting up experiments and seeing what happens, and observing the way things go wrong. Typically evolution will find the flaws in any fitness score (metric) that is defined, often resulting in solutions that give good scores, but only because evolution was 'cheating', or rather, exploiting the flaws in the metric. This problem comes up a lot and it is is educational to go through that process and try to think up more robust tasks and ways of measuring success on those tasks. Ideally we want metrics that are continuous, so that we can 'follow the gradient' of success rather than e.g. having a fitness score just go from zero to a perfect score in one step.

Also it's been educational seeing how networks evolve. One of the early modifications I made to NEAT was to periodically disable additive mutations (add nodes and connections), so as to strip away any redundant structure that could be slowing down evaluation of each neural net. This came from just seeing the networks grow rapidly, and not always with an fitness score gains to show for it.

In summation I don't have a good answer for you, but if you can get interested in setting up an experiment and seeing how evolution tries to solve it all by itself hen maybe you'll get the 'bug' and start to think about where the limits to this approach are, and different ways of overcoming some of the present limits. From there you might want to pick through the occasional research paper to see what others are doing - I've found the NEAT based research to be quite accessible, i.e. e.g. not requiring a a lot of maths knowledge as is the case with deep learning and the like. Although I admit I do follow that world and have had to learn quite a bit of maths to keep up - and I think that's useful to draw ideas from that area.

Hope that is of some help anyway.


Evolutionary algorithms are perennially intriguing, but generally other approaches are favored. Odd, considering the incredible success of evolution with extremely limited trials from a computational perspective. I wonder why evolutionary computation does not live up to the expectation?


> I wonder why evolutionary computation does not live up to the expectation?

There's also a social factor here, I think. Neural networks were disregarded due to over-claims and also perhaps due to some bias for machine learning methods that relied on understandable statistical models. In the same way, evolutionary computing is often looked down upon by the rest of the AI community - it's a lot of hacking and hand waving and "try it and see". It doesn't help that there's been a lot of low-quality work in EC. But this bias has perhaps starved EC of talent somewhat; much of the progress in deep learning is probably a result of increased research activity and optimism, i.e. progress is self-reinforcing.


I'd guess limited computing resources. Nature had billions of years times trillions of "experiments" instance to reach it's amazing current state.


Not only that, but a virtually unbounded population space to test. Count every single organism alive today.


Our top supercomputers can process tens of petaflops. That seems to easily dwarf these biological time scales you mention. There appears to be some magic in evolution that surpasses mere trial limitations.


I'd like to double down and say the top supercomputers today couldn't, in reasonable time, simulate a single generation of the genetic variation in your gut flora alone.


Sure, but that means the magic does not lie in evolution, but in the extreme amounts of information in biological organisms. Evolution is supposed to be able to generate incredibly complex organisms from very simple beginnings, all within a number of trials that can be simulated on today's desktop. Insofar as we displace the problem from evolution to the organism or environment we are saying there is nothing special about evolution. In which case it begs the question what is so special about evolution in the first place?


Are these numbers really outside the range of modern computers?


There's a distinct gap between "most powerful supercomputers available" and "compute clusters typically used for this sort of problem."


There are teraflop processors for the desktop now. Our standard 60 GFlop processors will take less than a minute to do a trillion operations. I don't see the computational bottleneck you refer to.

At least from a viewpoint of number of trials, we can easily replicate the number of trials evolution has to work with using an everyday desktop. Why don't we see comparable evolution in the desktop? There is some sort of magic to evolution we have not yet fathomed.


> At the time, its small group of practitioners thought it might be an alternative to the more conventional ANN training algorithm called backpropagation (a form of stochastic gradient descent).

The author doesn't seem to understand what backpropagation is - gradient descent is performed on the derivative formed from backpropagation, they are not forms of one another.


Yep, also SGD is not necessarily implied either (for example, full batch gradient descent).


Isn't detecting unlimited novelty impossible due to Chaitin's incompleteness theorem?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: