Great, let's rewrite it. We'll start with this piece here and get immediate benefit without having to complete the entire rewrite first, and then we'll decide which piece to do next!
The difference between 'high' and 'low' metabolism is very small. You just eat less which is why you are thin as a rail. People are very bad at self-reporting caloric intake, there are several studies on this.
I watched the commentary on Youtube and it was fantastic! I don't play go myself but I was glued to the screen the whole way. I particularly enjoyed how the commentators demonstrated why the moves made sense by playing theoretical future moves right on the board they had up.
I was actually thinking primarily of distributed training time for the networks and playing time for the system, rather than the number of GPUs running this particular match. Also, I thought the number of GPUs in October was more on the order of 1,000? Happy to be told I'm mistaken though.
Sort-of repeating a comment I made last time AlphaGo came up:
As far as I know there is nothing particularly novel about AlphaGo, in the sense that if we stuck an AI researcher from ten years ago in a time machine to today, the researcher would not be astonished by the brilliant new techniques and ideas behind AlphaGo; rather, the time-traveling researcher would probably categorize AlphaGo as the result of ten years' incremental refinement of already-known techniques, and of ten years' worth of hardware development coupled with a company able to devote the resources to building it.
So if what we had ten years ago wasn't generally considered "true AI", what about AlphaGo causes it to deserve that title, given that it really seems to be just "the same as we already had, refined a bit and running on better hardware"?
It's easy to say that, in the same way that people now wouldn't be surprised if we could factor large numbers in linear time if we had a functional quantum computer!!
10 years ago no one believed it was possible to train deep nets[1].
It wasn't until the current "revolution" that people learned how important parameter initialization was. Sure, it's not a new algorithm, but it made the problem tractable.
So far as algorithmic innovations go, there's always ReLU (2011) and leaky ReLU (2014). The one-weird-trick paper was pretty important too.
[1] Training deep multi-layered neural networks is known to be hard. The standard learning strategy—
consisting of randomly initializing the weights of the network and applying gradient descent using
backpropagation—is known empirically to find poor solutions for networks with 3 or more hidden
layers. As this is a negative result, it has not been much reported in the machine learning literature.
For that reason, artificial neural networks have been limited to one or two hidden layers
It's only difficult because no one threw money at it. It's like saying going to Mars is difficult. It is - but most of the technology is there already, just need money to improve what was used to go to the moon.
If you asked people 10 years ago before the moon landing if it was possible, I too would agree it's impossible. But after that breakthrough it opened up the realm is possibilities.
I see AlphaGo more of an incremental improvement than a breakthrough.
Agreed. People don't realize that all of the huge algorithmic innovations (LSTMs, Convolutional neural networks, backpropagation) were invented in past neural net booms. I can't think of any novel algorithms of the same impact and ubiquity (e.g. universally considered to be huge algorithmic leaps) that have been invented in this current boom. The current boom started due to GPUs.
Something being invented previously doesn't mean that it existed as a matter of engineering practicality; improved performance is some but not all of that. Just describing something in a paper isn't enough to make it have impact, many things described in papers simply don't work as described.
A decade ago I was trying and failing to build multi-layer networks with back-propagation-- it doesn't work so well. More modern, refined, training techniques seem to work much better... and today tools for them are ubiquitous and are known to work (especially with extra CPU thrown at them :) ).
The point is that no one could train deep nets 10 years ago. Not just because of computing power, but because of bad initializations, and bad transfer functions, and bad regularization techniques, etc.
These things might seem like "small iterative refinements", but they add up to 100x improvement. Even when you don't consider hardware. And you should consider hardware too, it's also a factor in the advancement of AI.
Also reading through old research, there is a lot of silly ideas along with the good ones. It's only in retrospect that we know this specific set of techniques work, and the rest are garbage. At the time it was far from certain what the future of NNs would look like. To say it was predictable is hindsight bias.
Reconnecting to my original point way up-thread, my point is these "innovations" have not substantially expanded the types of models we are capable of expressing (they have certainly expanded the size of the types of models we're able to train), not nearly to the same degree as backprop/convnets/LSTMs did way back decades ago (this is important because AGI will require several expansions in the types of models we are capable of implementing).
Right, LSTM was invented 20 years ago. 20 years from now, the great new thing will be something that has been published today. It takes time for new innovations to gain popularity and find their uses. That doesn't mean innovations are not being made!
> As far as I know there is nothing particularly novel about AlphaGo,
By that standard there's nothing particularly novel about anything. Everything we have today is just a slight improvement of what we already had yesterday.
World experts in go and ML as recently as last year thought it would be many more years before this day happened. Who are you to trivialize this historic moment?
Some experts in Go less than 10 years ago believed it would be accomplished within 10 years. Also, you didn't actually refute his argument. Can you point to an algorithm that is not an incremental improvement over algorithms that existed 10 years ago? MCTS and reinforcement learning with function approximators definitely existed 20 years ago.
No, that's what they're saying. Take any invention and you can break it down into just a slight improvement of the sub-inventions it consists of.
A light bulb is just a metal wire encased in a non-flammable gas and you run electricity through it. It was long known that things get hot when you run electricity through them, and that hot things burst into fire, and that you can prevent fire by removing oxygen, and that glass is transparent. It's not a big deal to combine these components. A lot of people still celebrate it as a great invention, and in my opinion it is! Think about how inconvenient gas lighting is and how much better electrical light is.
Same thing with AlphaGo. Sure, if you break it down to its subcomponents it's just clever application of previously known techniques, like any other invention. But it's the result that makes it cool, not how they arrived at it!
All algorithms are incremental improvements of existing techniques. This isn't a card you can use to diminish all progress as "just a minor improvement what's the fuss".
No, not all inventions are incremental improvements of existing techniques. Backpropagation and convolutional nets, for example. Now, you might counter with the fact that it's just the chain rule (and convolution existed before that), but the point is it that algorithm had never been used in machine learning before.
People have used neural nets as function approximators for reinforcement learning with MCTS for game playing well before AlphaGo (!!).
Your lightbulb example actually supports my point. The lightbulb was the product of more than a half-century of work by hundreds of engineers/scientists. I have no problem with pointing to 70 years of work as a breakthrough invention.
I think the thing that would surprise a researcher from ten years ago is mainly the use of graphics cards for general compute. The shader units of 2005 would only be starting to get to a degree of flexibility and power where you could think to use them for gpgpu tasks.
I got my first consumer gpu in 1997 and was thinking about how to do nongraphical tasks on it almost immediately. I didn't come up with anything practically useful back then and they were much more limited but I don't think someone from 2006 would find it surprising to hear that this was a thing.
I don't know... CUDA was released almost 9 years ago. So I don't think it's a stretch to suggest that cutting edge researchers from 10 years ago would have been thinking about using GPU's that way.
Human mind... pshaw. You say this "human" thing is special? I don't see it: In my day we also had protons and electrons... all you're showing me is another mishmash of hadroic matter and leptons. So you've refined the arrangement here and there, okay, but I don't see any fundamental breakthrough only incremental refinement.
The effectiveness? Ten years ago researchers thought humanlike intelligence must necessarily involve something more than simple techniques. Today that position becomes a lot harder to defend.