I am 15 years into this computers thing and this blog post made me feel like "th...

dr_zoidberg · on July 29, 2015

They are, but once you start learning about them, you realize the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers".

A neural net is a graph, in which a subset of nodes are "inputs" (that's where the net gets information), some are outputs, and there are other nodes which are called "hidden neurons".

The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net. For example I-H-O is a tipical feed forward net, in which I (inputs) is the input layer, H is the hidden layer and O the output layer. All the hidden neurons connect with all the input neurons "output", and all the output neurons connect to the hidden neurons "output". The connections are called "weights", and the training adjusts the weights of all the neuron with lots of cases until the desired output is achieved. There are also algorithms and criteria to stop before the net "learns too much" and looses the ability to generalize (this is called overfitting). In particular, a net with one hidden layer and one output layer is a universal function estimator -- that is, an estimator that can model any mathematical function of the form f(x1, x2, x3, ..., xn) = y.

Deep learning means you're using a feedforward net with lots of hidden layers (I think it's usually between 5 to 15 now), which apply convolution operators (hence the "convolutional" in the name), and lots of neurons (in the order of thousands). All this was nearly impossible until the GPGPUs came along, because of the time it took to train a modest network (minutes to hours for a net with a between 50 to 150 neurons in one hidden layer).

This is a very shortened explanation -- if you want to read more I recommend this link[1] which gives some simple Python code to illustrate and implement the innards of a basic neural network and you can learn from the inside. Once you get that you should move to more mature implementations, like Theano or Torch to get the full potential of neutral nets without worrying about implementation.

[1] http://iamtrask.github.io/2015/07/12/basic-python-network/

frozenport · on July 29, 2015

>>They are, but once you start learning about them, you realize the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers".

Oh humbug! The black magic comes from the vast resources Google drew to obtain perfect training datasets. Each step in the process took years to tune, demonstrating that data is indeed for those who dont have enough priors.

beambot · on July 29, 2015

You could say very much the same about the brain...

> [...] the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers". A brain is a graph, in which a subset of neurons are "inputs", some are outputs, and others are "hidden". The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net.

The deep question about deep learning is "Why is it so bloody effective?"

dr_zoidberg · on July 29, 2015

I work in the field, and while some models are based on biological structures/systems, there's a lot of fuzz about them being "based on biological foundations" that is now best avoided. Yes, it is true the model is based on them, but it's a model that only covers very little of the real complexity. So in a sense, it's naive to say "put a billion neurons in there and you'll get a rat brain" (as was publicized one time).

The effectiveness comes from their non-linear nature and their ability to "learn" (store knowledge in the weights, that is derived from the training process). And black magic, of course!

Lawtonfogle · on July 29, 2015

If there is magic to be found, it may be in that question. Why about graphs (namely the subset that are deep neural networks) allow them to not only contain such powerful heuristics, but also allow them to be created from scratch with barely any knowledge of the problem domain.

As a side note, I was playing a board game last night (Terra Mystica I believe) and wondering if you could get 5 different neural networks to play the game and then train them against each other (and once they are good enough, against players). I wonder how quickly one could train a network that is unbeatable by humans? Maybe even scale it up to training it to play multiple board games til it is really good at all of them before setting it lose on a brand new one (with a similar genre). Maybe Google could use this to make a Go bot.

But what happens if this is used for evil instead? Say a neural network that reads a person's body language and determines how easily they can be intimidated by either a criminal or the government. Or one that is used to hunt down political dissidents. Imagine the first warrant to be signed by a judge for no reason other than a neural network saying the target is probably committing a crime...

thaumasiotes · on July 29, 2015

The best Go bot approach (as of some years ago, but it's not like neural networks are a new idea) uses a very different strategy. Specifically, the strategy of "identify a few possible moves, simulate the game for several steps after each move using a very stupid move-making heuristic instead of using this actual strategy recursively, and then pick the move that yielded the best simulated board state".

deepnet · on July 29, 2015

Monte Carlo Tree Search ( Random playout ) is currently the best computer strategy for evaluating a Go position.

This is likely due to the way Go works , random playout provides a rough estimate of who controls what territory ( this is how Go is scored ).

Recently two deep-learning papers showed very impressive results.

http://arxiv.org/abs/1412.3409

http://arxiv.org/abs/1412.6564

The neural networks were tasked with predicting what move an expert would make given a position.

The MCTS takes a long time 100,000 playouts are typical - once trained the neural nets are orders of magnitude faster.

The neural nets output a probability for each move ( that an expert would make that move ) - all positions are evauluated in a single forward pass.

Current work centers around combining the two approaches, MCTS evaluates the best suggestions from the neural net.

Expert Human players are still unbeatable by computer Go.

deepnet · on July 30, 2015

For Chess see David Silver's work on TreeStrap

It learns to master level from self-play.

http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_fil...

also his lecture bootstrapping from tree based search

http://www.cse.unsw.edu.au/~cs9414/15s1/lect/1page/TreeStrap...

and Silver's overview on board game learning

http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching_files/g...

arielby · on July 29, 2015

The "use a stupid heuristic as part of the evaluation function" is is, in fact, also an important part of Chess AI's mode (as Quiescence Search), through for different reasons.

zwegner · on July 29, 2015

> Maybe Google could use this to make a Go bot.

There was in fact a group within Google that worked on this: http://www.cs.toronto.edu/~cmaddis/pubs/deepgo.pdf

deepnet · on July 30, 2015

and the follow up from Google's Deepmind group :

Move Evaluation in Go Using Deep Convolutional Neural Networks Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver

http://arxiv.org/abs/1412.6564

z92 · on July 30, 2015

Before clicking I was assuming it would fail. Then read this in the summary: "When the trained convolutional network was used directly to play games of Go, without any search, it beat the traditional search program GnuGo in 97% of games, and matched the performance of a state-of-the-art Monte-Carlo tree search that simulates a million positions per move."

sushirain · on July 29, 2015

They are effective because:

- They use more parameters (and fewer computations per parameter.)

- They are hierarchical (convolutions are apparently useful at different levels of abstraction of data).

- They are distributed (word2vec, thought-vectors). Not restricted to a small set of artificial classes such as parts-of-speech or parts of visual objects.

- They are recurrent (RNN).

etc.

kylebgorman · on July 30, 2015

word2vec isn't "deep" in the relevant sense. The both skipgram and CBOW forms have a single hidden layer.

hyperbovine · on July 29, 2015

It's not really that deep, imo: a typical deep net these days has O(10^8) parameters (e.g. http://stackoverflow.com/questions/28232235/how-to-calculate...). You can store a hell of a lot of patterns in that many parameters, making them the best pattern matchers the world has ever seen. (Un)fortunately, pattern matching != intelligence. More interesting deep questions for which there is precious little theory revolve around the design of the networks themselves.

shostack · on July 29, 2015

Is "pattern matching != intelligence" what occurred when the Google image recognition stuff in the news recently was shown to recognize the pattern of a "dumbbell" as always having a large muscular arm attached to it?

Seemed like a great way to highlight the limitations of patterns.

hyperbovine · on July 29, 2015

I hadn't heard about that but it sounds like what I'm talking about. With their ever expanding training corpus Google's net will eventually learn that dumbbells and arms are separate entities, but it will never deduce that on its own. And if it did it would not be able to generalize that to the fact that wedding rings and fingers are different (I hypothesize). Basically there is a whole other component of "intelligence" that feels absent from neural nets, which is why visions of AI lording over humanity don't exactly keep me up at night. (Autonomous weapons otoh...)

mistercow · on Aug 5, 2015

> Deep learning means ... which apply convolution operators

Convolutional networks are only one kind of deep learning. In particular, they generally apply only to image processing.

dchichkov · on July 29, 2015

They are doing matrix multiplications. To pass input a single time through even some very large neural network - it is a relatively fast operation (if compared to training such a network, that is). Training requires data centers and arrays of GPUs. Passing the input through the network - usually you can get away with a single core and vectorized operations. Unless you are doing high resolution computer vision in real time... You can still get away with the single core even then, but that requires some very smart sublinear processing.

ibrahima · on July 29, 2015

Completely right. Applying a neural network is much faster than training one. The main trick here is fitting the trained model into cache (or smaller) so that the matrix multiplies are fast.

amelius · on July 29, 2015

> this blog post made me feel like "those guys are doing black magic".

Two remarks. First, these guys probably don't know very well why what they are doing works so well ;) It requires a lot of trial and error, and a lot of patience and a lot of compute power (the latter being the reason why we are seeing breakthroughs only now).

Second, training a neural net requires different computing power from deploying the net. The neural network that is installed on your phone has been trained using a lot of time and/or a very large cluster. Your phone is merely "running" the network, and this requires much less compute power.

jchomali · on July 29, 2015

Of course they are

ytdht · on July 29, 2015

they are awesome, but not that difficult to implement