> To find these patterns, a growing number of geneticists are turning to a form of machine learning called deep learning. Proponents of the approach say that deep-learning algorithms incorporate fewer explicit assumptions about what the genetic signatures of natural selection should look like than do conventional statistical methods.
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.
“What are you doing?”, asked Minsky.
“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.
“Why is the net wired randomly?”, asked Minsky.
“I do not want it to have any preconceptions of how to play”, Sussman said.
Minsky then shut his eyes.
“Why do you close your eyes?”, Sussman asked his teacher.
“So that the room will be empty.”
At that moment, Sussman was enlightened.
I think AlphaGo vs AlphaZero is a strong argument against this. AlphaGo used the best knowledge of humanity to try to tune its play and mix human expertise, and centuries of master level play, with the strength of deep learning systems. It seems like this would be ideal, particularly as per your analogy. Google certainly believed this as this is the system they directed their resources towards developing and then very publicly demonstrating.
AlphaZero was likely a curious aside at one point. The idea with AlphaZero is to throw everything out the window. Even the game itself is just generalized into a generic position->transition->terminal state system where most 'simple' (in terms of rules) games can be relatively easy replaced. There was no human heuristics, no human games to go from instead relying entirely on self training, and so on. Of course it ended up that AlphaZero became vastly stronger vastly faster than AlphaGo ever managed to.
AlphaZero heavily used architecture developed for AlphaGo, an in this sense there is some prior embedded in the architecture, but to jump-start development of this fine-tuned architecture human prior was useful. I.e. path from nothing to AlphaZero would be more difficult then nothing->AlphaGo->AlphaZero, imho.
I increasingly feel that deep learning needs to incorporate more ideas from evolution. Not just for parameter optimization but architecture discovery itself.
Imagine pitting neural networks in an adversarial environment (just like the real world). Under competitive pressure for limited food (computation resources), evolved neural network could start approaching optimal architectures that do the job but have no superfluous, pre-conceived notions of the (modeled) world. In fact, such evolved architectures could encode relevant notions of the world directly which we can learn about by reverse engineering evolved architecture.
This is closely related to the ideas from predictive processing which closely ties survival to prediction (as to predict your future states is to avoid getting dissipated). So I'm anticipating evolution or survival notions to come up in a big way in ML/Deep Learning in future.
> Imagine pitting neural networks in an adversarial environment (just like the real world). Under competitive pressure for limited food (computation resources), evolved neural network could start approaching optimal architectures that do the job but have no superfluous, pre-conceived notions of the (modeled) world.
Sounds almost like these robots [0], which learned to lie to each other. They didn't use a neural net, though.
>approaching optimal architectures that do the job but have no superfluous, pre-conceived notions of the (modeled) world
Usually, modelling efficiency is related to the amount of priors. The more (correct) priors, the faster a model learns, and from less data.
A prior-poor model just spends data learning those correct priors in the first place.
There are countless examples I could give to support intuition of this. There isn't, AFAIK, any theoretical work yet.
That the priors are correctly aligned with reality is vital. A model with fixed bad priors cannot unlearn them. If the priors are bad but not fixed, you're just spending time and data to correct those priors.
We know all of the rules of tic-tac-toe. We have no idea how much we dont know about genetics.
The things we know can still be incorporated in various ways with deep learning, but we avoid having to make as many obviously not universally applicable assumptions as with less complex statistical methods.
> we avoid having to make as many obviously not universally applicable assumptions
The point of the above koan is that there are always assumptions (priors) whether you like them or not. You can make them explicit or random and unknown. This is the crux of the bias-variance tradeoff [1]. Whether deep NNs manage to circumvent that in any meaningful way (especially outside the domain of spatial/temporal analog signals) is an open question.
Exactly, there is simply no way for a human to understand this stuff without (some layers of) abstraction, each abstraction introduces bias and/or error.
How could I embed a knowledge/assumption in deep learning neurons? In high level programming language, it would be easy, but tweaking the neurons parameter to embed that knowledge? Sounds more difficult than writing machine code.
Any deviation from a series of full-connected layers represents some assumption being made, usually to reduce the size of the parameter space to a subset that is considered more promising.
Convolutions are one example: they assume that proximity correlates with a logical connection.
Note that this is a very useful assumption. Just shuffle the pixels in a photo and try to discern what they show to see how much we rely on that assumption. In fact I'm having trouble coming up with an obvious counterexample[0].
So let's not fall into the trap of these armchair scientists with the big spliff, staring into the distance and intoning trivialities with the air of revelation: "Man.... you're just a slave to your assumptions. What if, like, space and time are one and the same?"
In fact, one could argue that all of AI is an endeavour to find abstract rules defining what's "trivially obvious" to us. You don't have to explain to children that objects in the distance are smaller than when they are close.
Once you succeed with that, it's possible that ML can find a sort of post-modern reality. One that we are blinded to for cultural reasons and the structure of our perception: what if God, for example, appears in the form of seemingly random "pixel errors"? You would easily miss her constant presence due to all the error correction in the pathway of your perception (and also your camera sensors).
But that's the future. Just as art often flourishes within the confines of (often arbitrary) limitations, so do we. And embracing these limitations is not done for reason of ignorance, but expedience.
Depends. Many standard layers express a form of prior knowledge. A CNN layer embeds the assumption of spatial translation invariance, an RNN does the same for temporal translation. Graph Neural Nets have permutation invariance. Assumptions can also be expressed as regularisation terms added to the loss function. One common practice is to initialise a net with the weights of another net trained on a related task - usually CNNs trained on ImageNet, and word embeddings for NLP (though lately it is possible to use deep neural nets such as BERT, ELMo, ULMFiT and OpenAI transformer pre-trained on large text corpora).
It's not just black box modelling of variants to hunt for mutations in a population anymore. But full functional design of RNA molecules for regulation and expression ;)
Hm, please correct me if I'm wrong, but isn't the Zuker algorithm inherently a dynamic programming algorithm? If so, I would have liked to see this paper compare to directly optimizing the input space using a differentiable form of the Zuker algorithm, e.g., using ideas related to https://arxiv.org/abs/1802.03676.
IMO it's unnatural to rely on deep reinforcement learning when we know so much about the mapping between input space and reward space (here, because Zuker's algorithm is not a black box). To me, deep RL is primarily for the domain where the mapping is highly complex and largely black box. Deep RL might be more useful if the authors didn't use Zuker's approximate algorithm but instead performed molecular simulations to determine secondary structure folding that is more faithful to reality.
no, just no. It is not enlightened to become isolated to pure math rules, it is automaton. Integration with a living, active world is not optional. Read Minsky yourself, minus the hero-worship... very similar response to Kurzweil and a few others, btw.
The same trick as DeepVariant, if you encode your genetic variants as images and give that to a CNN you get reasonably good results without doing much extra work!
DeepVariant didn’t actually explicitly encode the variants (nor the raw data) as images. Press reports —including from Google Research itself) suggested that this was the case but nothing in the original publication said so, and the researchers themselves have disputed it.
It just so happens that the tensor representation lends itself well to visualisation as (multi-channel) images. But that wasn’t the intent, it’s just a nice side-effect. In reality the data is laid out in tensors that follow naturally from how they were generated (i.e. via alignment of many stochastically generated DNA fragments to a reference sequence).
Yeah, the way they talk about this is admittedly confusing. The best way for me to think about it (from [1]) is to mentally add quotation marks around the word “image” whenever the methods mention it. Mathematically the data is stored in higher-dimensional tensors with one dimension representing genomic coordinates, another dimension representing the sequence depth (i.e. one row per sequence read), and additional dimensions representing features of the sequence (such as nucleoside identity, read direction, error probability, match/mismatch with the reference, etc). I use almost the same kinds of tensors for work that has no tangible relation to images or visualisation. It’s simply the most straightforward way of representing this data as a tensor.
But the similarity to images is so tantalising that even the paper’s methods [2] fall prey to this, even though the dimensionality and minor details are wrong. Furthermore, the term “pileup image” refers to a common way of visualising genome–read alignments [3]. The DeepVariant tensor is not a pileup image but it is very close. And the tensor can be converted in to an image [4] but as mentioned this requires some transformations (splitting the channels, and rescaling the values).
Thank you for the explanation, it makes perfect sense - as an RGB picture is represented in a three dimensional list of numbers for red, green, blue you're not constrained by that, you might as well throw anything in there.
I would be interested to hear more about how you overcome these cases when you do not have enough samples to train on. This seems like a useful area to improve upon.
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. “What are you doing?”, asked Minsky. “I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied. “Why is the net wired randomly?”, asked Minsky. “I do not want it to have any preconceptions of how to play”, Sussman said. Minsky then shut his eyes. “Why do you close your eyes?”, Sussman asked his teacher. “So that the room will be empty.” At that moment, Sussman was enlightened.