Preface: AlphaGo is an amazing achievement and does show an interesting advancem...

gracenotes · on March 14, 2016

I took a closer read through the AlphaGo paper today. There are some other features that make it not general.

In particular, the initial input to the neural networks is a 19×19×48 grid, and the layers of this grid include information like:

- How many turns since a move was played

- Number of liberties (empty adjacent points)

- How many opponent stones would be captured

- How many of own stones would be captured

- Number of liberties after this move is played

- Whether a move at this point is a successful ladder capture

- Whether a move at this point is a successful ladder escape

- Whether a move is legal and does not fill its own eyes

Again, before the neural nets even get involved. Some of these layers are repeated 8 times for symmetry. I would say for some of these, AlphaGo got some domain-specific help in a non-general way.

It is of course still groundbreaking academically. The architecture is a state-of-the-art deep learning setup and we learned a ton about how Go and games in general work. The interaction between supervised and reinforcement learning was interesting, especially how the latter behaved worse in practice in selecting most likely moves.

disclaim: Googler, not in anything AI.

bbsome · on March 14, 2016

Note, that these features are for the RollOut fast policy. The reason is that this needs to be fast, so rather than a net they have a linear policy. A linear policy in order to work it requires good feature selection, which is what this is. In some future, when we have better hardware, you can imagine removing the roll out policy and having just one.

aprescott · on March 14, 2016

I think you're confusing this list of attributes with a separate list used for rollouts and tree search. The ones above are definitely used for the neutral networks in the policy and value networks. See: "Extended Data Table 2: Input features for neural networks. Feature planes used by the policy network (all but last feature) and value network (all features)."

wrp · on March 14, 2016

I'm having trouble identifying what algorithmic innovation AlphaGo represents. It looks like a case of more_data + more_hardware. Some are making a big deal of the separate evaluation and policy networks. So, OK, you have an ensemble classifier.

The most theoretically interesting thing to me is the use of stochastic sampling to reduce the search space. Is there any discussion of how well pure Monte Carlo tree search performs here compared to the system incorporating extensive domain knowledge?

atrudeau · on March 14, 2016

Wow, this really took me by surprise. I thought the only input was (s_1...s_final, whowon) where s are statates during training and (s_current) during play, and the system would learn the game on its own. That's the way it worked with the Atari games anyway.

ethbro · on March 14, 2016

I expect the Atari games, if we're thinking of the same articles, had much less strategic depth than playing a Go champion.

svalorzen · on March 14, 2016

- As all problems can be converted to Markov Decision Processes, this is a moot point. The transition may not be efficient in terms of states/actions, but since we just nearly solved a problem with more states than the atoms of the universe, this seems to be a moot point. In addition, most problems for humans are actually in the form of {state,action}. Just because now ML is popular and it's all about putting label on data does not make this less true. Other algorithms already exist (POMCP) which are direct transformations of MCTS for uncertain environments. Determinism is not needed both for MCTS nor for POMCP, so that point is void too.

- I don't see what the problem is with that. Nearly all tasks humans would currently gain in getting automated have plenty of human experience before it. This is why training in any field even exists. Sure it would be cool to send a drone in unexplored territory with no knowledge and come back later to a self-built city, but I don't see how putting that milestone a bit later is in any way a problem.

- The hardware requirements for any new interesting breakthrough to do something important have always been immense. Just as for Deep Blue, or to all the graphite experiments where we can only produce just milligrams of it at a time, or to solar panel, each new discovery innovates in a specific direction. It is just a matter of time before the new method is substantially improved and will be able to run on single machines. To expect otherwise is foolish, to complain about it is pointless.

EDIT:

This is not to contradict the general point. That we don't need new breakthroughs is utterly false in so many ways it's not even funny. However this certainly is an AI breakthough, one that many AI researchers (myself included) thought it would take AT LEAST 10 years to come to pass. Breaking 10 years in 1 seems like a breakthrough to me.

Smerity · on March 14, 2016

Thanks for the good discussion :)

+ The issue with MCTS was not that it couldn't be extended to non-determinism (you're correct re: POMCP) but that it requires a simulator which produces at least a reasonably accurate model of the world. This simulator is almost always hand engineered. Determinism and perfect information simplify both the task and the creation of the simulator. The state in Go also contains all history required to perform the next optimal computation - i.e. the system doesn't have to have any notion of memory - yet the state in most real world tasks is far more complicated, at least in regards to what needs to be remembered and how that should be stored.

+ The point of the hardware requirements is that hardware advances will advance the state of the art in Go but will not do the same in many other ML tasks. Whether or not we have 1x or 10x AlphaGo distributed's computing power in our pocket is not the issue - the issue is that such computing power won't assist many tasks as the potential of our ML models are not compute bound.

There's also disagreement about how concerted the "10 year jump" was, which is mentioned in the article by Miles Brundage. Many people (including Michael Bowling, the person who designed the system that "solved" limit heads-up Texas Hold Em) predicted professional level Go play around now. Whilst it might be held by many, I also feel it was a media reinforced estimate.

svalorzen · on March 15, 2016

Simulators should be fast, sure, however that they should be deterministic to be fast is something I do not believe. We have millions of programs already that make use of pseudo random generated numbers, and they don't seem to be suffering performance problems because of that.

And about the state and memory concern, mcts does not care directly about it, since for it the simulator is used as a black box and its internals are irrelevant to it. Instead, as long as any environment configuration can be described in terms of state (essentially a unique state->number conversion must be possible - and even then not always) mcts will work. And since it also does not care about the size of the state space, the concern that having memory as one of the factors in the state would be problematic is also unfounded.

I also disagree on the specificity of AlphaGO. Mcts has been used successfully in many fields after its initial usage and tuning for Go. I did my thesis on similar algorithms. In the same way, it does not matter whether AlphaGO can be directly used on other problems. What matters is the new idea of using NNs in order to improve substantially and with little overhead the value estimations used by mcts to explore the decision tree. This is the true breakthrough. The fact that the first implementation of this idea is a Go playing program is irrelevant, it's more like a showcase of the goodness of the approach.

doctorstupid · on March 14, 2016

This is besides your main point, but I just want to add that the 'problem' of Go is not nearly solved. Rather, nearly solved is the problem of beating a human Go player. There is quite a difference.

platz · on March 14, 2016

I wouldn't expect global maximums to even be definable for most problems

emodendroket · on March 14, 2016

I think a "solution" for Go is a program that can play to a win (or a draw, I guess) in any circumstance, right? I mean, like, you can create a tic-tac-toe program that plays perfectly and will never lose, but you cannot do this for go.

drdeca · on March 14, 2016

... Yet.

Growth mindset.

jacquesm · on March 14, 2016

> the 'problem' of Go is not nearly solved.

And it is very likely that it will never be, the number of combinations is simply too large.

zardo · on March 14, 2016

That doesn't (necessarily) mean a proof can't be found that a certain ruleset leads to optimal play. Tic-tac-toe can be solved without examing the entire state space of the game.

jacquesm · on March 14, 2016

That's mostly because of trivial symmetry and even taking that into account Go space is in the most literal sense astronomic, simple numbers don't work in describing just how large it is.

dsharlet · on March 14, 2016

> The hardware requirements were stunning (280 GPUs and 1920 CPUs for the largest variant) and were an integral part to how well AlphaGo performed

Is that really true? Demis stated that the distributed version of AlphaGo beats a single machine version only 75% of the time. That's still stronger than virtually all human players, and probably would still beat Lee Sedol at least once.

Smerity · on March 14, 2016

The best discussion in terms of hardware adjusted algorithmic process I've seen is by Miles Brundage[1]. The additional hardware was the difference between amateur and pro level skills. It's also important to note that the non-distributed AlphaGo still had quite considerable compute power.

Now was a sweet spot in time. This algorithm ten years ago would likely have been pathetic and in ten years time will likely superhuman in the same manner that chess is.

None of these constitute a general advance in AI however.

[1]: http://www.milesbrundage.com/blog-posts/alphago-and-ai-progr...

dsharlet · on March 14, 2016

I don't really buy his argument. Lots of other companies with plenty of resources have been attacking this problem, including Facebook and Baidu. People have been talking about Go AIs for decades. If it was just a matter of throwing a few servers at Crazy Stone or another known Go algorithm, it would have been done already.

Smerity · on March 14, 2016

The companies may have plenty of resources but those resources were not solely dedicated to this problem. You mention Facebook and they were indeed on the verge of putting time into this with - though their team is far smaller (1-2 people) and they still used less compute resources. From the linked Miles article:

"Facebook’s darkfmcts3 is the only version I know of that definitely uses GPUs, and it uses 64 GPUs in the biggest version and 8 CPUs (so, more GPUs than single machine AlphaGo, but fewer CPUs). ... Darkfmcts3 achieved a solid 5d ranking, a 2-3 dan improvement over where it was just a few months earlier..."

cjbprime · on March 14, 2016

That can't be right. Amateurs do not beat pro players 25% of the time, yet single machine AlphaGo beats distributed AlphaGo 25% of the time.

emcq · on March 14, 2016

I don't think comparing win loss distributions is particularly insightful.

By a single machine winning many games relative to the distributed version, really it's just saying that the value/policy network is more important than the monte carlo tree search. The main difference is the number of tree search evaluations you can do; it doesn't seem like they have a more sophisticated model in the parallel version.

This suggests that there are systematic mistakes that the single 8 GPU machine makes compared to the distributed 280 GPU machine, but MCTS can smooth some of the individual mistakes over a bit.

I would suspect that the general Go-playing population of humans do not share some of the systematic mistakes, so you likely won't be able to project these win/loss distributions to playing humans.

simonh · on March 14, 2016

Then it would seem that AlphaGo on a single machine isn't equivalent to an Amateur. Presumably it's really good even when running in a single machine, and the marginal improvement for each additional machine tails off quite quickly.

But when you're pitching it at a top class opponent in a match getting global coverage, you want all the incremental improvement you can get.

tarsinge · on March 14, 2016

It is a major AI breakthrough, but not the sci-fi like. AlphaGo is a true AI for Go in the sense that it develops intuition and anticipation by evaluating probabilities, same as any human player. A Go-specific philosophical zombie in short. And when we will have superhuman philosophical zombies for each problem, what will be the point of a general AI?

Edit: typo

tarsinge · on March 14, 2016

to add to that, even our brain seems to work with dedicated specialized sub system. Wild supposition but maybe general AI could be something like a load balancer/reverse proxy to lot of problem specific AI. When a human learns to play Go, is some small part of the brain retrained specifically for this task? If yes, then AlphaGo could in fact be a building block. Architecture would look like "reverse proxy" -> sub system dedicated to the learning process -> sub system trained for learning board games -> sub system trained for playing Go. Absolutely not expert so it surely has been thought by the way /wild disgression

simonh · on March 14, 2016

Marvin Minsky thought exactly along these lines. Check out the interview below, especially the last few minutes. All his interviews in that series are well worth watching.

http://youtu.be/wPeVMDYodN8

plywoodtrees · on March 16, 2016

> The hardware requirements were stunning (280 GPUs and 1920 CPUs for the largest variant)

Stunningly big, or stunningly small?

The CPUs would cost around $65/hour on Google Cloud. I can't immediately find pricing for GPUs on either Amazon or Google, but let's suppose it doubles that price.

It's pretty small potatoes.

Especially if you put it in the context of a project with ~15 researchers.

kotach · on March 14, 2016

Given Langford's locality vs globality argument this also gets quite obvious for the 4th game mistake and overconfidence that AlphaGo had. The rate of growth of the compounding error for local decision maker is going to cause these kinds of mistakes more often than not.