Hacker News new | past | comments | ask | show | jobs | submit login

Preface: AlphaGo is an amazing achievement and does show an interesting advancement in the field.

Yet ... it really doesn't mean almost anything that people are predicting it to mean. Slashdot went so far as to say that "We know now that we don't need any big new breakthroughs to get to true AI". The field of ML/AI is in a fight where people want more science fiction than scientific reality. Science fiction is sexy, sells well, and doesn't require the specifics.

Some of the limitations preventing AlphaGo from being general:

+ Monte Carlo tree search (MCTS) is really effective at Go but not applicable to many other domains we care about. If your problem is in terms of {state, action} pairs and you're able to run simulations to predict outcomes, great, but otherwise, not so much. Go also has the advantage of perfect information (you know the full state of the board) and deterministic simulation (you know with certainty what the state is after action A).

+ The neural networks (NN) were bootstrapped by predicting the next moves in more matches than any individual human has ever seen, let alone played. It then played more against itself (cool!) to improve - but it didn't learn that from scratch. They're aiming to learn this step without the human database but it'll still be very different (read: inefficient) compared to the type of learning a human does.

+ The hardware requirements were stunning (280 GPUs and 1920 CPUs for the largest variant) and were an integral part to how well AlphaGo performed - yet adding hardware won't "solve" most other ML tasks. The computational power primarily helped improve MCTS which roughly equates to "more simulations gets a better solution" (though with NNs to guesstimate an end state instead of having to simulate all the way to an end state themselves)

Again, amazing, interesting, stunning, but not an indication we've reached a key AI milestone.

For a brilliant overview: http://www.milesbrundage.com/blog-posts/alphago-and-ai-progr...

John Langford also put his opinion up at: http://hunch.net/?p=3692542

(note: copied from my Facebook mini-rant inspired by Langford, LeCun, and discussions with ML colleagues in recent days)




I took a closer read through the AlphaGo paper today. There are some other features that make it not general.

In particular, the initial input to the neural networks is a 19×19×48 grid, and the layers of this grid include information like:

- How many turns since a move was played

- Number of liberties (empty adjacent points)

- How many opponent stones would be captured

- How many of own stones would be captured

- Number of liberties after this move is played

- Whether a move at this point is a successful ladder capture

- Whether a move at this point is a successful ladder escape

- Whether a move is legal and does not fill its own eyes

Again, before the neural nets even get involved. Some of these layers are repeated 8 times for symmetry. I would say for some of these, AlphaGo got some domain-specific help in a non-general way.

It is of course still groundbreaking academically. The architecture is a state-of-the-art deep learning setup and we learned a ton about how Go and games in general work. The interaction between supervised and reinforcement learning was interesting, especially how the latter behaved worse in practice in selecting most likely moves.

disclaim: Googler, not in anything AI.


Note, that these features are for the RollOut fast policy. The reason is that this needs to be fast, so rather than a net they have a linear policy. A linear policy in order to work it requires good feature selection, which is what this is. In some future, when we have better hardware, you can imagine removing the roll out policy and having just one.


I think you're confusing this list of attributes with a separate list used for rollouts and tree search. The ones above are definitely used for the neutral networks in the policy and value networks. See: "Extended Data Table 2: Input features for neural networks. Feature planes used by the policy network (all but last feature) and value network (all features)."


I'm having trouble identifying what algorithmic innovation AlphaGo represents. It looks like a case of more_data + more_hardware. Some are making a big deal of the separate evaluation and policy networks. So, OK, you have an ensemble classifier.

The most theoretically interesting thing to me is the use of stochastic sampling to reduce the search space. Is there any discussion of how well pure Monte Carlo tree search performs here compared to the system incorporating extensive domain knowledge?


Wow, this really took me by surprise. I thought the only input was (s_1...s_final, whowon) where s are statates during training and (s_current) during play, and the system would learn the game on its own. That's the way it worked with the Atari games anyway.


I expect the Atari games, if we're thinking of the same articles, had much less strategic depth than playing a Go champion.


- As all problems can be converted to Markov Decision Processes, this is a moot point. The transition may not be efficient in terms of states/actions, but since we just nearly solved a problem with more states than the atoms of the universe, this seems to be a moot point. In addition, most problems for humans are actually in the form of {state,action}. Just because now ML is popular and it's all about putting label on data does not make this less true. Other algorithms already exist (POMCP) which are direct transformations of MCTS for uncertain environments. Determinism is not needed both for MCTS nor for POMCP, so that point is void too.

- I don't see what the problem is with that. Nearly all tasks humans would currently gain in getting automated have plenty of human experience before it. This is why training in any field even exists. Sure it would be cool to send a drone in unexplored territory with no knowledge and come back later to a self-built city, but I don't see how putting that milestone a bit later is in any way a problem.

- The hardware requirements for any new interesting breakthrough to do something important have always been immense. Just as for Deep Blue, or to all the graphite experiments where we can only produce just milligrams of it at a time, or to solar panel, each new discovery innovates in a specific direction. It is just a matter of time before the new method is substantially improved and will be able to run on single machines. To expect otherwise is foolish, to complain about it is pointless.

EDIT:

This is not to contradict the general point. That we don't need new breakthroughs is utterly false in so many ways it's not even funny. However this certainly is an AI breakthough, one that many AI researchers (myself included) thought it would take AT LEAST 10 years to come to pass. Breaking 10 years in 1 seems like a breakthrough to me.


Thanks for the good discussion :)

+ The issue with MCTS was not that it couldn't be extended to non-determinism (you're correct re: POMCP) but that it requires a simulator which produces at least a reasonably accurate model of the world. This simulator is almost always hand engineered. Determinism and perfect information simplify both the task and the creation of the simulator. The state in Go also contains all history required to perform the next optimal computation - i.e. the system doesn't have to have any notion of memory - yet the state in most real world tasks is far more complicated, at least in regards to what needs to be remembered and how that should be stored.

+ The point of the hardware requirements is that hardware advances will advance the state of the art in Go but will not do the same in many other ML tasks. Whether or not we have 1x or 10x AlphaGo distributed's computing power in our pocket is not the issue - the issue is that such computing power won't assist many tasks as the potential of our ML models are not compute bound.

There's also disagreement about how concerted the "10 year jump" was, which is mentioned in the article by Miles Brundage. Many people (including Michael Bowling, the person who designed the system that "solved" limit heads-up Texas Hold Em) predicted professional level Go play around now. Whilst it might be held by many, I also feel it was a media reinforced estimate.


Simulators should be fast, sure, however that they should be deterministic to be fast is something I do not believe. We have millions of programs already that make use of pseudo random generated numbers, and they don't seem to be suffering performance problems because of that.

And about the state and memory concern, mcts does not care directly about it, since for it the simulator is used as a black box and its internals are irrelevant to it. Instead, as long as any environment configuration can be described in terms of state (essentially a unique state->number conversion must be possible - and even then not always) mcts will work. And since it also does not care about the size of the state space, the concern that having memory as one of the factors in the state would be problematic is also unfounded.

I also disagree on the specificity of AlphaGO. Mcts has been used successfully in many fields after its initial usage and tuning for Go. I did my thesis on similar algorithms. In the same way, it does not matter whether AlphaGO can be directly used on other problems. What matters is the new idea of using NNs in order to improve substantially and with little overhead the value estimations used by mcts to explore the decision tree. This is the true breakthrough. The fact that the first implementation of this idea is a Go playing program is irrelevant, it's more like a showcase of the goodness of the approach.


This is besides your main point, but I just want to add that the 'problem' of Go is not nearly solved. Rather, nearly solved is the problem of beating a human Go player. There is quite a difference.


I wouldn't expect global maximums to even be definable for most problems


I think a "solution" for Go is a program that can play to a win (or a draw, I guess) in any circumstance, right? I mean, like, you can create a tic-tac-toe program that plays perfectly and will never lose, but you cannot do this for go.


... Yet.

Growth mindset.


> the 'problem' of Go is not nearly solved.

And it is very likely that it will never be, the number of combinations is simply too large.


That doesn't (necessarily) mean a proof can't be found that a certain ruleset leads to optimal play. Tic-tac-toe can be solved without examing the entire state space of the game.


That's mostly because of trivial symmetry and even taking that into account Go space is in the most literal sense astronomic, simple numbers don't work in describing just how large it is.


> The hardware requirements were stunning (280 GPUs and 1920 CPUs for the largest variant) and were an integral part to how well AlphaGo performed

Is that really true? Demis stated that the distributed version of AlphaGo beats a single machine version only 75% of the time. That's still stronger than virtually all human players, and probably would still beat Lee Sedol at least once.


The best discussion in terms of hardware adjusted algorithmic process I've seen is by Miles Brundage[1]. The additional hardware was the difference between amateur and pro level skills. It's also important to note that the non-distributed AlphaGo still had quite considerable compute power.

Now was a sweet spot in time. This algorithm ten years ago would likely have been pathetic and in ten years time will likely superhuman in the same manner that chess is.

None of these constitute a general advance in AI however.

[1]: http://www.milesbrundage.com/blog-posts/alphago-and-ai-progr...


I don't really buy his argument. Lots of other companies with plenty of resources have been attacking this problem, including Facebook and Baidu. People have been talking about Go AIs for decades. If it was just a matter of throwing a few servers at Crazy Stone or another known Go algorithm, it would have been done already.


The companies may have plenty of resources but those resources were not solely dedicated to this problem. You mention Facebook and they were indeed on the verge of putting time into this with - though their team is far smaller (1-2 people) and they still used less compute resources. From the linked Miles article:

"Facebook’s darkfmcts3 is the only version I know of that definitely uses GPUs, and it uses 64 GPUs in the biggest version and 8 CPUs (so, more GPUs than single machine AlphaGo, but fewer CPUs). ... Darkfmcts3 achieved a solid 5d ranking, a 2-3 dan improvement over where it was just a few months earlier..."


That can't be right. Amateurs do not beat pro players 25% of the time, yet single machine AlphaGo beats distributed AlphaGo 25% of the time.


I don't think comparing win loss distributions is particularly insightful.

By a single machine winning many games relative to the distributed version, really it's just saying that the value/policy network is more important than the monte carlo tree search. The main difference is the number of tree search evaluations you can do; it doesn't seem like they have a more sophisticated model in the parallel version.

This suggests that there are systematic mistakes that the single 8 GPU machine makes compared to the distributed 280 GPU machine, but MCTS can smooth some of the individual mistakes over a bit.

I would suspect that the general Go-playing population of humans do not share some of the systematic mistakes, so you likely won't be able to project these win/loss distributions to playing humans.


Then it would seem that AlphaGo on a single machine isn't equivalent to an Amateur. Presumably it's really good even when running in a single machine, and the marginal improvement for each additional machine tails off quite quickly.

But when you're pitching it at a top class opponent in a match getting global coverage, you want all the incremental improvement you can get.


It is a major AI breakthrough, but not the sci-fi like. AlphaGo is a true AI for Go in the sense that it develops intuition and anticipation by evaluating probabilities, same as any human player. A Go-specific philosophical zombie in short. And when we will have superhuman philosophical zombies for each problem, what will be the point of a general AI?

Edit: typo


to add to that, even our brain seems to work with dedicated specialized sub system. Wild supposition but maybe general AI could be something like a load balancer/reverse proxy to lot of problem specific AI. When a human learns to play Go, is some small part of the brain retrained specifically for this task? If yes, then AlphaGo could in fact be a building block. Architecture would look like "reverse proxy" -> sub system dedicated to the learning process -> sub system trained for learning board games -> sub system trained for playing Go. Absolutely not expert so it surely has been thought by the way /wild disgression


Marvin Minsky thought exactly along these lines. Check out the interview below, especially the last few minutes. All his interviews in that series are well worth watching.

http://youtu.be/wPeVMDYodN8


> The hardware requirements were stunning (280 GPUs and 1920 CPUs for the largest variant)

Stunningly big, or stunningly small?

The CPUs would cost around $65/hour on Google Cloud. I can't immediately find pricing for GPUs on either Amazon or Google, but let's suppose it doubles that price.

It's pretty small potatoes.

Especially if you put it in the context of a project with ~15 researchers.


Given Langford's locality vs globality argument this also gets quite obvious for the 4th game mistake and overconfidence that AlphaGo had. The rate of growth of the compounding error for local decision maker is going to cause these kinds of mistakes more often than not.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: