Such an important concept and a title that's for clever not clarity. I'm glad somebody finally published this. Short version:
- words are the symbolic indicators of thought vectors
- words carry with each a probabilistic stream of potential further thoughts and links to past symbols.
- much like implicit CFD, they are backward convolved with prior words to determine most likely hidden thought, and then forward solved to determine next word.
- further, these streams are described with formal logic relationships based on the identities of the included words which can have levels of "meta-identity" (ie: I can't know some pair are brother and sister without having having been given the idea of bros/sis pairs or seen others)
- knowledge of more or varied relationships (and more logic paths) provides me more efficient / accurate ways to solve an optimized path through the higher dimensions of word / symbol space.
- in a sense, I may never know the idea of "bros / sis" but it is probabilistically highly likely that given a male and female with the same parents that they are also bros / sis.
Interesting. I'd love to see the output of this 25-way machine translation neural network.
Still... I'm not very knowledgeable about AI, but the part of this that describes biological mechanisms really feels handwavy to me.
What we have in our heads is not a cleaned up version of the input. There are no pixels or symbol strings in the head.
All we have in our heads is big activity vectors that cause more big activity vectors.
Big activity vectors means nothing. What is the structure of these vectors? Even if the brain's image processing turns out to have a lot of random complications and exceptions that fit the data your eyes have seen before (even if these innumerable complications are necessary for it to work well), surely some god with perfect understanding of it could still summarize it as generally following such-and-such algorithms. Even if you don't know what the neurons in your model are for, I bet most of them are in fact "for" something comprehensible by mere humans, and understanding that would make it easier to design good starting conditions for models. Not that I can prove it...
(Note: I suspect that the author of this presentation has a more nuanced view than what I'm complaining about - or maybe I'm just misinterpreting it. This comment is just intended to start discussion about what's written on the slide.)
The 'big activity vector' language is partly a reference to the Vector Word Embedding stuff (see [1] for an explanation). The surprising thing about that is that it is possible to learn an embedded of individual words in a multi-dimensional vector space, such that (for example) vector(Queen)-vector(Woman) ~= vector(King)-vector(Man). Which is to say that there's a general 'royalty' direction within the vector space - and all this can be learned purely from seeing large amounts of English text (no 'traditional' supervised training). Perhaps a 'god' could identify the meaning of each direction in the space (or region of words), but the big Machine Learning labs (Google, Baidu, Montreal, Stanford, Facebook, etc) are proving that purely manipulating the vectors in the abstract works really well.
In addition to the 'word vectors' as inputs, the RNNs illustrated are also iterating over an internal state (flowing from left to right though the same network for each new word) - and this internal state is also an embedding of some kind. But it's going to be very difficult to decipher what each dimension here represents, as it's being built purely as a function of the input word vectors, its own previous state and a NN with initially random weights.
Now, although actual 'brain experiments' have shown that individual neuron (or local clusters) apparently light up when particular thoughts are had (alternatively, cause thoughts to be had), each cluster seems likely to be just one aspect of (say) 'dogginess'. So, one area will correspond to the smell of dogs, others to wet noses, others to being outdoors (i.e. all aspects of the overall 'dogginess' concept) - but these things will all overlap in multiple ways with other concept 'vectors'. Which is how huge spaces of ideas are searched in parallel, rather than sequentially (using, say, an is_doggy_quality symbol).
There are also parallels here with the Numenta Sparse Distributed Representations [2].
Overall, this presentation seems to be probing at the frontier of what works, and how to leverage that up into something that's more about 'general thinking' rather than pattern matching. It also appears to be a thought-piece, rather than a conference presentation (though, of course, Hinton deserves to be heard on just about anything in NNs, IMHO).
Perhaps it's not getting votes because the title is so non-descriptive...
"Hinton's internal presentation on AI and Deep Learning" might attract the attention it deserves.
(I'm not really advocating a title change/resubmission, but if someone's reading the comments before going to Google Drive, it may be more of a hint about value/bandwidth)
I feel his explanation of activity vectors closely resembles/indicates Hebbian learning rules (due to the dependence on co-occurrences).
Pierre Balid from UCI has a paper coming soon about the theory behind learning rules; I think it might present some interesting ideas for this work from Hinton.
I would love to see what would come out of such a NN trained on a set of chess games. Would it "make the next valid move"? Would it appear to behave tactically? Strategically?
- words are the symbolic indicators of thought vectors
- words carry with each a probabilistic stream of potential further thoughts and links to past symbols.
- much like implicit CFD, they are backward convolved with prior words to determine most likely hidden thought, and then forward solved to determine next word.
- further, these streams are described with formal logic relationships based on the identities of the included words which can have levels of "meta-identity" (ie: I can't know some pair are brother and sister without having having been given the idea of bros/sis pairs or seen others)
- knowledge of more or varied relationships (and more logic paths) provides me more efficient / accurate ways to solve an optimized path through the higher dimensions of word / symbol space.
- in a sense, I may never know the idea of "bros / sis" but it is probabilistically highly likely that given a male and female with the same parents that they are also bros / sis.
So close to the tech tree ding.