We propose an alternative approach to construct an artificial learning system, which naturally learns in an unsupervised manner. Its mathematical prototype is a dynamical system, which automatically shapes its vector field in response to the input signal. The vector field converges to a gradient of a multi-dimensional probability density distribution of the input process, taken with negative sign. The most probable patterns are represented by the stable fixed points, whose basins of attraction are formed automatically. The performance of this system is illustrated with musical signals.
They estimated how likely particular notes were, using a kind of averaging procedure that corresponding to simulating an elastic foam. They used a fourier transform to identify the notes in a piece of music: they then simulated the physical procedure of adding a stone to the foam at a location corresponding to the frequency, and time corresponding to when the note was played. The final shape of the foam then gives a density estimate of the data: the more often a particular note was played, the deeper is the dip in the foam at that location ('the foam evolves into a time-averaged density of the input').
Once you've done this, you have a curved surface. You can imagine a ball being released at some point and rolling around: it could get stuck in the bottom of a well ('a stable fixed point'), which corresponds to the 'most probable patterns' in the training data.
Edit to add:
"The vector field converges to a gradient of a multi-dimensional probability density distribution of the input process, taken with negative sign"
You can think of the "vector field" of the dynamical system describing the movement of the ball as a bunch of arrows, giving the direction the ball would move if released from each possible starting point. If you trace these arrows end-to-end, you can see the trajectories that the ball could take.
Since the ball will roll downhill, the vector field of the system is (minus) the gradient of the potential energy/height. Also, because of how we formed the surface, its height 'converged to the probability density of the input process' (the more frequent an input was, the lower the corresponding dip in the foam). Thus, the vector field of the system converges to the gradient of the probability density of the input.
It's almost exactly a rubber sheet with balls placed upon the surface. The weight of the balls creates depressions in the sheet. This is it really, except for two extra things:
1:When the balls are removed, the sheet slowly returns to flat.
2: The sheet isn't 2 dimensional but has as many dimensions as the input vector.
Another arxiv preprint by the same authors has more detail, and seems to be the essentially the same article but fleshed out more: http://arxiv.org/abs/1111.4443
Yay, someone with the sophistication required is working on this! As I've been thinking for a few years, intelligence is more about absorbing pattern from the world than it is about the deliberate process of constructing pattern from scratch within the system. It's is better to create a space where the pattern of the world can live, subject to some measure of objective utility, than it is to deduce some pattern from scratch using that same objective function.
You're just describing unsupervised learning; these guys are far from the only ones working in the field, and this is far from the most impressive approach. This is just time-averaged kernel density estimation, with an approachable metaphor attached to it.
Cognition is pattern recognition, and we don't ever access the raw data, just the output labels - this is so Buddhist. In a sense we create our (inner) world because it is just patterns.
Recognizing patterns in data is so easy to do, that how easy it is, turns out to be a major problem in machine learning. You can always find some explanation that perfectly fits all the data you see, but that doesn't mean it will fit the data you haven't yet seen.
The challenge is finding the simplest patterns that will generalize to explain the most data, while wasting as little effort as possible on the irrelevant patterns. In high dimension data, the number of possible relationships to analyze explode, you can find patterns everywhere you look, so it's deciding where to bother looking with your limited resources that's hard.
That patterns are nothing special doesn't seem obvious to us because, evolution has done a pretty good job solving this problem(in the domain of inputs we evolved to deal with), and we only perceive those patterns that are likely to generalize.
"What lay behind the illusion of the park was just the real park."
So no, not that Buddhist. The function of cognition is not to detach ourselves from an insignificant, not-really-there illusionary world, but in fact to create increasingly accurate, but still tractable, "illusions" that let our brains draw closer and closer to the real world.
Exactly. I'm secretly thinking that some of the reason we have failed so far at general AI is we are coming at the problem from a western philosophical tradition :) Nice to meet you brother
Could anyone eli5 this?