They are, but once you start learning about them, you realize the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers".
A neural net is a graph, in which a subset of nodes are "inputs" (that's where the net gets information), some are outputs, and there are other nodes which are called "hidden neurons".
The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net. For example I-H-O is a tipical feed forward net, in which I (inputs) is the input layer, H is the hidden layer and O the output layer. All the hidden neurons connect with all the input neurons "output", and all the output neurons connect to the hidden neurons "output". The connections are called "weights", and the training adjusts the weights of all the neuron with lots of cases until the desired output is achieved. There are also algorithms and criteria to stop before the net "learns too much" and looses the ability to generalize (this is called overfitting). In particular, a net with one hidden layer and one output layer is a universal function estimator -- that is, an estimator that can model any mathematical function of the form f(x1, x2, x3, ..., xn) = y.
Deep learning means you're using a feedforward net with lots of hidden layers (I think it's usually between 5 to 15 now), which apply convolution operators (hence the "convolutional" in the name), and lots of neurons (in the order of thousands). All this was nearly impossible until the GPGPUs came along, because of the time it took to train a modest network (minutes to hours for a net with a between 50 to 150 neurons in one hidden layer).
This is a very shortened explanation -- if you want to read more I recommend this link[1] which gives some simple Python code to illustrate and implement the innards of a basic neural network and you can learn from the inside. Once you get that you should move to more mature implementations, like Theano or Torch to get the full potential of neutral nets without worrying about implementation.
>>They are, but once you start learning about them, you realize the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers".
Oh humbug! The black magic comes from the vast resources Google drew to obtain perfect training datasets. Each step in the process took years to tune, demonstrating that data is indeed for those who dont have enough priors.
You could say very much the same about the brain...
> [...] the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers". A brain is a graph, in which a subset of neurons are "inputs", some are outputs, and others are "hidden". The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net.
The deep question about deep learning is "Why is it so bloody effective?"
I work in the field, and while some models are based on biological structures/systems, there's a lot of fuzz about them being "based on biological foundations" that is now best avoided. Yes, it is true the model is based on them, but it's a model that only covers very little of the real complexity. So in a sense, it's naive to say "put a billion neurons in there and you'll get a rat brain" (as was publicized one time).
The effectiveness comes from their non-linear nature and their ability to "learn" (store knowledge in the weights, that is derived from the training process). And black magic, of course!
If there is magic to be found, it may be in that question. Why about graphs (namely the subset that are deep neural networks) allow them to not only contain such powerful heuristics, but also allow them to be created from scratch with barely any knowledge of the problem domain.
As a side note, I was playing a board game last night (Terra Mystica I believe) and wondering if you could get 5 different neural networks to play the game and then train them against each other (and once they are good enough, against players). I wonder how quickly one could train a network that is unbeatable by humans? Maybe even scale it up to training it to play multiple board games til it is really good at all of them before setting it lose on a brand new one (with a similar genre). Maybe Google could use this to make a Go bot.
But what happens if this is used for evil instead? Say a neural network that reads a person's body language and determines how easily they can be intimidated by either a criminal or the government. Or one that is used to hunt down political dissidents. Imagine the first warrant to be signed by a judge for no reason other than a neural network saying the target is probably committing a crime...
The best Go bot approach (as of some years ago, but it's not like neural networks are a new idea) uses a very different strategy. Specifically, the strategy of "identify a few possible moves, simulate the game for several steps after each move using a very stupid move-making heuristic instead of using this actual strategy recursively, and then pick the move that yielded the best simulated board state".
The "use a stupid heuristic as part of the evaluation function" is is, in fact, also an important part of Chess AI's mode (as Quiescence Search), through for different reasons.
Before clicking I was assuming it would fail. Then read this in the summary: "When the trained convolutional network was used directly to play games of Go, without any search, it beat the traditional search program GnuGo in 97% of games, and matched the performance of a state-of-the-art Monte-Carlo tree search that simulates a million positions per move."
- They use more parameters (and fewer computations per parameter.)
- They are hierarchical (convolutions are apparently useful at different levels of abstraction of data).
- They are distributed (word2vec, thought-vectors). Not restricted to a small set of artificial classes such as parts-of-speech or parts of visual objects.
It's not really that deep, imo: a typical deep net these days has O(10^8) parameters (e.g. http://stackoverflow.com/questions/28232235/how-to-calculate...). You can store a hell of a lot of patterns in that many parameters, making them the best pattern matchers the world has ever seen. (Un)fortunately, pattern matching != intelligence. More interesting deep questions for which there is precious little theory revolve around the design of the networks themselves.
Is "pattern matching != intelligence" what occurred when the Google image recognition stuff in the news recently was shown to recognize the pattern of a "dumbbell" as always having a large muscular arm attached to it?
Seemed like a great way to highlight the limitations of patterns.
I hadn't heard about that but it sounds like what I'm talking about. With their ever expanding training corpus Google's net will eventually learn that dumbbells and arms are separate entities, but it will never deduce that on its own. And if it did it would not be able to generalize that to the fact that wedding rings and fingers are different (I hypothesize). Basically there is a whole other component of "intelligence" that feels absent from neural nets, which is why visions of AI lording over humanity don't exactly keep me up at night. (Autonomous weapons otoh...)
They are doing matrix multiplications. To pass input a single time through even some very large neural network - it is a relatively fast operation (if compared to training such a network, that is). Training requires data centers and arrays of GPUs. Passing the input through the network - usually you can get away with a single core and vectorized operations. Unless you are doing high resolution computer vision in real time... You can still get away with the single core even then, but that requires some very smart sublinear processing.
Completely right. Applying a neural network is much faster than training one. The main trick here is fitting the trained model into cache (or smaller) so that the matrix multiplies are fast.
> this blog post made me feel like "those guys are doing black magic".
Two remarks. First, these guys probably don't know very well why what they are doing works so well ;) It requires a lot of trial and error, and a lot of patience and a lot of compute power (the latter being the reason why we are seeing breakthroughs only now).
Second, training a neural net requires different computing power from deploying the net. The neural network that is installed on your phone has been trained using a lot of time and/or a very large cluster. Your phone is merely "running" the network, and this requires much less compute power.
Neural networks and deep learning are truly awesome technologies.