The fact that they only used self play with no outside input here is really interesting. I wonder if this system produced more new styles of play. While I am not that familiar with Go, I know in some of the other articles they talk about things like Chinese starts that are specific to certain cultures. I wonder if the fact that it had no outside reinforcement made it produce movements that we have already seen that are somehow inherent to the game, or if it produced many more new moves that were a result of it learning without any possibility of cultural interference. According to the article it did invent some unconventional and creative moves, but I also wonder how much it rediscovered.
I also wonder how much it’s style of play changes if it were re trained, due to the random start that it is given. Maybe that would produce something like seeds for procedurally generated worlds in games. Like if they could find a seed for a Chinese or japanese players, or ones that more aggressive styles. This is some pretty cool work and may open up even more doors for pure reinforcement learning
I don't think it's an overstatement to say that, since playing Lee Sedol in 2016, AlphaGo has completely revolutionized professional and amateur go. It's certainly not unprecedented — the last major revolution happened in the early 20th century (often called the 'Shin Fuseki' era [0]) — but AlphaGo has demonstrably surpassed any previous high-water mark.
> I wonder if this system produced more new styles of play.
Absolutely. One such innovation has been the use of early 3-3 invasions [1]. There are many more, and indeed AlphaGo's games are still being analyzed by professional players. Michael Redmond, a 9-dan professional, has been working with the American Go Association on one such series [2].
> I wonder if the fact that it had no outside reinforcement made it produce movements that we have already seen that are somehow inherent to the game...
Interestingly, yes. Strong players have commented that AlphaGo seems to agree with things that players like Go Seigen [3] have suggested in the past, but that were never fully developed or understood [4].
I have only skimmed the paper but one thing I don't see any discussion of is whether komi (the handicap given to white for going second) is correct.
They do say the rules used for all games, including self-play, set komi consistently to 7.5 .
If the strongest AI was consistently winning predominantly with one color it would be an indication that komi isn't fair for the best play.
Of the 20 games released for the strongest play it appears white won 14 times and black 6. I don't think that is enough to be conclusive but maybe komi is too high.
I wonder if different "correct" play at the strongest levels would be learned with a 6.5 komi.
You can only change komi by full point increments. There is a .5 to break ties, but a komi of 7.5 is identical to one of 7.4.
From a theoretical standpoint, any non-integer komi should lead to one player winning 100% of the time. So even if the actual win ratio is 14:6 at komi=7.5 that might still be the best value.
If you had an estimate of the real difference, you could switch to breaking ties randomly. Black wins 60% of the ties, white wins 40%. There will be a ratio at which each side should win 50% of the time.
I agree that with perfect play, it will be a 50% of a tie to each side. But it is still interesting to ask for a better estimate of practical play.
Michael Redmond mentions this in the AlphaGo vs AlphaGo review series he's doing with the AGA. AlphaGo selfplay games are with 7.5 komi under Chinese rules, and apparently, Deepmind has stated that black vs white wins is almost exactly 50/50. IIRC Redmond mentioned that white (?) only had some sub 1% advantage in the entire self-play corpus.
I don't remember where I read it but in some earlier versions of AlphaGo they tried a komi of 6.5 and black ended up winning more often. That indicates the correct komi value is 7, but since Go doesn't have ties, you have to pick which side you want to favor to break the tie. (White seems reasonable.)
Well in Japanese rules the komi is 6.5 so that's the alternative that tends to come up. Some quick searching I found a transcript from one of the games where DeepMind said 7.5 slightly favors white but they didn't say anything about 6.5 or 5.5, while a random comment from r/baduk claims that pro game analysis shows 6.5 slightly favors black and 7.5 slightly favors white.
the correct komi number has puzzled Go players for centuries, now we might finally have a chance to figure out the right answer (although not without some reservations). over the last 5 decades, komi has consistently been raised to keep the game more leveled between white and black (black makes the first move, so has the advantage). historically, there was no komi, and people kept an even game by always playing even number of games with each player switching sides after each game.
for whatever reason, it's no longer feasible in modern pro game (not to mention that this could result in no winner if each player wins half the game), so komi was introduced. at first at 5.5, and steadily climbed higher to 7.5 at present. In pro game, even a change of 1 is considered a big deal, so from 5.5 to 7.5 is hardly trivial.
Now with alphago playing "perfect" games against itself, we might finally be able to put to rest the debate of the correct komi (the Japanese Go associations for decades have kept meticulous records of every professional game, in order to find the correct komi).
There is a big "but" though. The correct komi at Alphgo Zero's level might not be the correct komi for human level players (AlphaGo is estimated to be 2-3 handicaps above human play; this is a bigger gap between the average pro player and the best amateurs).
Indeed, the change from 5.5 komi to 7.5 komi also had a lot to do with the change in play style rather than simply zooming in on the "correct" komi number. In the 70s and 80s, predominant play style was more conservative, and 5.5 might well be the correct komi for the time (defined as resulting in 50:50 chance of winning for either side). As play style shifted to become more aggressive and confrontational (actually fueld somewhat by the introduction of komi), it was discovered that komi needs to be raised to keep chances of winning at 50:50.
To make an analogy, suppose one is playing a casino game of chance that gives the house a slight advantage (similar to the first mover advantage for black in go). If one only makes small bets, the house will end up winning only a small amount. in other words, the player needs to be compensated by a small amount to make the game "fair".
If however one makes big bets (i.e. more aggressive game play), then the compensation needs to be bigger too, to make the game "fair", even if the underlying probabilities have not changed.
following this logic, while 7.5 komi is fair for Alphago vs. alphago games, it might not be the right number for human games. I suspect it might be samller for humans.... if only we could calibrate Alphago to the average human level and generate millions of self-play games...
With respect to your very interesting comment (I genuinely appreciate your input), you appear to have mis-understood the comment you were replying to.
You've commented on the differences in the style of play that AlphaGo introduced, but the post you were replying to (by
aeleos) was going a step further and hypothesising about the potential for a newer, completely 'non-human' style that AlphaGo Zero may have created.
Your comments definitely contribute to the discussion but it was bugging me that there appeared to be a tangent forming about AlphaGo that was overlooking AlphaGo Zero which would be the more interesting area to explore.
Yes, aeleos was interested in that, and so am I, and it seems to be what this entire thread _should be about_. kndyry steered back towards AlphaGo. I'm not sure this merits any further disection.
> To assess the merits of self-play reinforcement learning, compared to learning from human data, we trained a second neural network (using the same architecture) to predict expert moves in the KGS Server data set; this achieved state-of-the-art prediction accuracy compared to pre
vious work 12,30–33 (see Extended Data Tables 1 and 2 for current and previous results, respectively). Supervised learning achieved a better initial performance, and was better at predicting human professional moves (Fig. 3). Notably, although supervised learning achieved higher move prediction accuracy, the self-learned player performed much better overall, defeating the human-trained player within the first 24 h of training. This suggests that AlphaGo Zero may be learning a strategy that is qualitatively different to human play.
That is really interesting. Given a neural network that solely exist to play Go, one that is influenced by the human mind is limited compared to the exact same set of neurons that doesn't have that influence.
EDIT: changed a set of neurons to neural network per andbbergers comments
Please don't refer to it as 'a set of neurons' - it only serves to fuel the (IMO) absolutely ridiculous AI winter fearmongering, and is also just a bad description. Neural nets are linear algebra blackboxes, the connections to biology are tenuous at best.
Sorry to be that guy, but the AI hype is getting out hand. COSYNE this year was packed with papers comparing deep learning to the brain... it drives me nutty. Convnets can be reasonably put into analogy with the visual system.... because they were inspired from it. But that's about it.
To address your actual comment: I would argue that this is not really interesting or surprising (at least to the ML practitioner), it is very well known that neural nets are incredibly sensitive to initialization. Think of it like this: as training progresses, parameters of neural nets move along manifolds in parameter space, but they can get nudged off of the "right" manifold and will never be able to recover.
Sorry for the rant, the AI hype is just getting really out of hand recently.
Machine learning is specifically not magic. Blackboxes are not useful. Convnets work so well because they build the symmetries of natural scenes directly into the model - natural scenes are translation invariant (as well as a couple of other symmetries), anything that models them sure as hell better have those symmetries too, or you're just crippling your model with extra superfluous parameters.
I changed my comment to neural network since a set of neurons is somewhat wrong, but I don't really agree that there isn't much of a connection between this and biology. While there might not be much of a connection between how they currently work and how our brains work, the whole point of machine learning and neural networks is to improve computers performance on the things we are good at. And while originally it was loosely modeled on it, and might be different know, it doesn't make it so people can't compare it to the brain. It would be wrong to say it is exactly like the brain, but I don't think there is anything wrong with comparing and contrasting the two. If our goal is to improve performance and we are the benchmark, then why shouldn't we compare them.
What I found interesting was mainly that it was us who nudged the parameter space you talked about into the "wrong" one manifold, especially given how old and complicated Go is. The sheer amount of human brain power that has been put into getting good at a game wasn't able to find certain aspects of it, and in 60 hours of training a neural network was able to.
I'm not saying there is nothing of value to be obtained by investigating connections between ML and the brain. That's how I got into ML in the first place, doing theoretical neuro research.
We absolutely should and do look to the brain for inspiration.
I'm taking issue with the rather ham-fisted series of papers that have come out in recent years aggressively pushing the agenda of connections between ML and neuro that just aren't there.
Are you sure that humans have done more net compute on Go than Deepmind just did? The Go game tree is _enormous_, humans are bias. We don't invent strategies from scratch, we use heuristics handed down to us from the pros (who in turn were handed down the heuristics from their mentors).
To me, it's not so interesting or surprising that the human initialized net performed worse. We just built the same biases and heuristics we have into that net.
As far as we know the brain is just a "linear algebra blackbox". It's an uninteresting reduction since linear algebra can describe almost everything. Yes NNs aren't magic, but neither is the brain. Likely they use similar principles. Hinton has a theory about how real neurons might be implementing a variation of backpropagation and there are a number of other theories.
>As far as we know the brain is just a "linear algebra blackbox"...Likely they use similar principles.
I'm not an expert, but my impression is that this is not really a reasonable claim, unless you're only considering very small function-like subsystems of the brain (e.g. visual cortex). Neural nets (of the nonrecurrent sort) are strict feed-forward function approximators, whereas the brain appears to be a big mess of feedback loops that is capable of (sloppily, and with much grumbling) modeling any algorithm you could want, and, importantly, adding small recursions/loops to the architecture as needed rather than a) unrolling them all into nonrecursive operations (like a feedforward net) or b) building them all into one central singly-nested loop (like an RNN).
The brain definitely seems to be using something backprop-like (in that it identifies pathways responsible for negative outcomes and penalizes them). But brains also seem to make efficiency improvements really aggressively (see: muscle memory, chunking, and other markers of proficiency), even in the absence of any external reward signal, which seems like something we don't really have a good analogue for in ANNs.
There are some parts of the brain we have no clue about. Episodic memory or our higher level ability to reason. But most of the brain is just low level pattern matching just like what NNs do.
The constraints you mention aren't deal breakers. We can make RNNs without maintaining a global state and fully unrolling the loop. See synthetic gradients for instance. NNs can do unsupervised learning as well, through things like autoencoders.
> It's an uninteresting reduction since linear algebra can describe almost everything.
The question is whether it can do so efficiently. As far as I know, alternating applications of affine transforms and non-linearities are not so useful for some computations that are known to occur in the brain such as routing, spatio-temporal clustering, frequency filtering, high-dimensional temporal states per neuron etc.
If he changes his opinion, which I understand to be models of the brain in this case, and each iteration improves the model, then that is perfectly fine. It would be bad if someone did not change their view in case of inconsistent evidence.
For political opinions sure, but if he's changing his opinions so often ...
When you're a big scientific figure, I think that you have some extra responsibility to the public to only say things you're very confident about. Or otherwise very clearly communicate your uncertainty!!
Agreed. If we announce that A* search is superhuman in finding best routes, most technorati would't bat an eye. Technically it is probably accurate to say that the results here show that neural networks can find good heuristics for MCTS search through unsupervised training in the game of Go. According to DeepMind authors:
"These search probabilities
usually select much stronger moves than the raw move probabilities of the neural network; MCTS may therefore be viewed as a powerful policy improvement operator. Self-play with search – using the improved MCTS-based policy to select each move, then using the game winner as a sample of the value – may be viewed as a powerful policy evaluation operator. The main idea of our reinforcement learning algorithm is to use these search operators repeatedly in a policy iteration procedure ..."
The fact that this reinforcement training is unsupervised from the very beginning is quite exciting and may lead to better heuristics for other kinds of combinatorial optimization problems.
Fully observable and we still have no idea what the hell it's doing.
Makes neuroscience seem kinda bleak doesn't it?
There has been a lot of great work lately building up a theory of how these things work, but it is very much still in the early stage. Jascha Sohl-Dickstein in particular has been doing some great work on this.
We don't even have answers to the most basic questions.
For instance (pedagogically), how the hell is it possible to train these things at all? They have ridiculously non-convex loss landscapes and we optimize in the dumbest conceivable way, first-order stochastic gradient descent. This should not work. But it does, all too often.
Not a great example because there are easy hand wavy arguments as to why it should work, but as far as proofs go...
The hand wavy argument goes as follows:
- we're in like a 10000 dimensional space, for the stationary point we're at to be a true local minima that means each one of those 10000 dimensions goes uphill in either direction. It's overwhelming likely that there's at least one way out
- there are many many different ways to set the params of the net for each function. Permutation is a simple example.
The tools we have developed so far are limited, but that's different from "there's no way". Many academics are working hard right now to better understand deep neural networks.
Yes, it turns out you can find meaningful information. etiam provided this https://arxiv.org/pdf/1312.6034.pdf The main issue is making sure what you are looking for is actually what the network is doing. You have to correctly interpret and visualize a jumble of numbers, which usually requires a hypothesis about how it worked in the first place. But assuming both go well you can gain meaningful information.
> things like Chinese starts that are specific to certain cultures
While it's true that there are national styles of play, the Chinese opening is not called that because it's really popular among Chinese people. It's called that because a particular Chinese pro helped popularize it, even though it was invented by a Japanese amateur.
See https://en.wikipedia.org/wiki/Chinese_opening for some more info. FWIW, I (a caucasian American) use this opening all the time. It's just a generally good opening if you like a certain style of play.
> talk about things like Chinese starts that are specific to certain cultures
Came here to make this point.
It's Chinese Opening, not Chinese start – similarly recall that you have the French Defense / Italian Defense / Scandinavian Defense among chess opening variations and none of these implies that that opening variation is specific to that culture or nation.
I'm not 100% sure I agree. It values probability of victory because that's it's goal. For humans, aiming only for probability of victory might not be as good, because we're much worse at estimating probabilities. So aiming for maintaining a large margin at all times is conceivably the best proxy that we can use in practice.
Agreed. I know I'm winning by 4 points but I have no idea about my probability of winning. However if I'm winning I know that I should play low risk moves and refrain from starting complicated fights. That increases the probability of winning. IMHO the exact value is out of reach for human beings.
It would be interesting to play human go, assisted by a go computer that doesn't say anything about moves, but rather just spits out, for each player, their current likelihood of victory if all further moves by both players were "what it would do."
That way, each player could know, at all times, (one major factor that goes into) their probability of winning. They'd still have to mentally adjust it for the likelihood of them and their opponent making an error, and how that can be controlled by making intimidating moves, etc. But it could lead to much tighter control on the abstract flow of the game.
It'd almost be like the computer was the general, issuing strategy, picking battles; and the human player the tactician, fighting those battles.
There is computer go program (maybe Crazy Stone?) that analyzes a game record and annotates it with the winning percentage for every move.
Knowing that the opponent's winning probability changed from 52 to 57 was interesting only because it hints at a mistake. In case of such a large change the program suggests the move it would have played.
I saw an annotated game record and there were no variations: I remember a suggested move that made me wonder "why!?".
Another benefit of seeing the value of the winning probability is an assessment of who's ahead. However that's already possible with the score estimation that programs and go servers provide. Sometimes is crude, sometimes is good, but it's the score, not the winning probability that humans can estimate when playing. The best probability estimate I can make is: if the score is close and the game is still complicated, it's 50-50; if the score is close but the game is almost over, it's 95-5 for who's ahead. If the score is not close, the player with more points will probably win.
The Go community learned to understand that the margin of victory is meaningless along time ago. The most famous game of Honinbo Dosaku, a famous Go player from the late 1600s, is arguably a game where he gave a handicap to his opponent and lost by one point. Lee Chang-Ho, who was the reigning champion in the late 90s, had a style that consistently tried to win by small margins.
AlphaGo now appears to be better than humans in all aspects of gameplay, and it better at calculating very thin margins of probability that a human cannot. This is not unique to any individual aspect of its gameplay; against humans it can also win by huge margins depending on what mistakes the human makes.
I think if AlphaGo foresaw it was losing by one point, it would start playing reckless moves, as it did against Lee-Sedol in the only match it lost against him.
Possibly a dumb q, but is ‘self play’ in any way related to ‘adversarial’ learning? I don’t see it mentioned in the article, but it reminds me of the principle.
In some ways it is, but the main difference is that adversarial learning (usually) produces a second neural network whose purpose is to exploit weakness is the first. Whereas reinforcement learning does not produce a second neural network to beat the first, it uses what it learned to solely improve the original.
As a side note, the main application I have seen with adversarial learning research is with photo recognition, but I guess you could have an adversarial network exist to help help improve an object recognition network. At that point it would probably become something between adversarial and reinforcement learning. However, with game based reinforcement learning, it doesn't require a second specific network as the adversary, it can easily just be paired against itself.
It isn't a dumb question, they are very similar in some ways. They mainly differ in what exactly the goal of the opponent is. In this case, it is to help improve itself, however in typical adversarial situations it is solely to exploit (become its adversary).
Yes, this is not an AGI. But the hockey-stick takeoff from defeats some players, to defeats an undefeated world-champion, to defeats the version of itself that beat the world champion 100% of the time is nuts. If this happens in other domains, like finance, health, paper clip collection, the word singularity is really well chosen--we can't see past this.
While this is promising, there's a long way to go between this and the other things you mentioned. Go is very well-defined, has an unequivocal objective scoring system that can be run very quickly, and can be simulated in such a way that the system can go through many, many iterations very quickly.
There's no way to train an AI like this for, say, health: We cannot simulate the human body to the level of detail that's required, and we definitely aren't going to be able to do it at the speed required for a system like this for a very long time. Producing a definitive, objective score for a paper clip collection is very difficult if not impossible.
AlphaGo/DeepMind represents a very strong approach to a certain set of well-defined problems, but most of the problems required for a general AI aren't well-defined.
> Do you care to give an example? Are they more or less well defined than find-the-cat-in-the-picture problem?
You mean like go over and feed the neighbor's cat while they're on vacation?
How about instead, being able to clean any arbitrary building?
Go isn't remotely similar to the real world. It's a board game. A challenging one, sure, and AlphaGo is quite a feat, but it's not exactly translatable to open ended tasks with variable environments and ill-specified rules (maybe the neighbor expects you to know to water the plants and feed the goldfish as well).
At this point, there is no evidence that the limiting factor in these cases is AI/software.
The limiting factor with the neighbors cat is the robotics of having a robust body and arm attachment. We know that the scope of current AI can:
1) Identify a request to feed a cat
2) Identify the cat, cat food and cat's bowl from camera data
3) Navigate an open space like a house
Being able to clean an arbitrary building is also more the challenge of building the robot than the AI identifying garbage on a floor or how to sweep something.
It is not clear there are hard theoretical limits on an AI any more. There are economic limits based on the cost of a programmer's attention. There are lots of hardware limits (including processor power).
In my opinion the deepest and most difficult aspect of this example is the notion of 'clean' which will be different across contexts. Abstractions of this kind are not even close to understood in the human semantic system, and in fact are still minimally researched. (I expect much of the progress on this to come from robotics, in fact.)
I remember seeing a demonstration by a deep learning guy of a commercially available robot cleaning a house under remote control. You are seriously underestimating the difficulty of developing software to solve these problems in an integrated way.
This. It is a lot like the business guy thinking it is trivial to program a 'SaaS business' because he has a high level idea in his mind. Like all things programming the devil is in the detail.
The hardware is certainly good enough to assist a person with a disability living in a ranch house with typical household tasks. As demonstrated by human in the loop operation.
We have have rockets that can go to orbit, and we have submersibles that can visit the ocean floor. That does not mean the rocket-submarine problem is solved, doing both together is not the same problem as doing both separately.
The difference is a go AI can play billions of games and a simple 20 line C program can check, for each game, who won.
For "cat in the picture", every picture must have the cat first identified by a person, so the training set is much smaller, and Google can't throw GPUs at the problem.
The absolute value of any Go board position is well-defined, and MCTS provides good computationally tractable approximations that get better as the rest of the system improves but already start better than random.
Check the Nature paper (and I think this is one of the biggest take-aways from AlphaGo Zero):
"Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte Carlo rollouts."
In this new version, MCTS is not even used to evaluate a position! Speaking as a Go player, the ability for the neural network to accurately evaluate a position without "reading" ahead is phenomenal (again, read the Nature paper last page for details).
You don't even need to produce an AGI for this kind of intelligence to be frightening.
At some point, a military is going to develop autonomous weapons that are vastly superior to human beings on the battle field, with no risk of losing human lives, and there is going to be a blitzkrieg sort of situation as the relative power of nations shifts dramatically.
If we have two such countries we could have massive drone and cyberwars being fought faster than people even can comprehend what's happening.
Right now most countries insist on maintaining human control over the machinery of death. But that will only last for as long as autonomous death machines don't dominate the battlefield.
It's a fun challenge right now to build a machine that can win in Starcraft, but it's really a hop skip and a jump from there to winning actual wars.
In that case you just nuke the shit out of everybody or create army if autonomous suicide bomber with nukes, biological and chemical weapons of all kinds. Once all humans are extinct the harmony on earth will be restored and everyone will leave happily ever after.
i'm not sure robot soldier is scarier than nukes. generally speaking, if they are just single task robots performing functions in dangerous situations, that seems like an improvement to risking human lives.
The core technique of AlphaGo is using tree search as a "policy improvement operator". Tree search doesn't work on most real-world tasks: the "game state" is too complex, there are too many choices, it's hard to predict the full effect of any choice you might make, and there often isn't even a "win" or "lose" state which would let you stop your self-play.
MCTS means "Monte-Carlo Tree Search". It's the core of the algorithm. The big difference is that it doesn't use rollouts, or random play: it chooses where to expand the tree based only on the neural network.
That's not what Monte Carlo Tree search is. The new version is still one neural network + MCTS. There's no way to store enough information to judge the efficiency of every possible move in a neural network, therefore a second algorithm to simulate outcomes is necessary.
If you read the paper, they do in fact still use monte-Carlo tree search. They just simplify their usage in conjunction with reducing the number of neural networks to 1
Tree search is also used during play. In the paper, they pit the pure neural net against other versions of the algorithm -- it ends up slightly worse than the version that played Fan Hui, at about 3000 ELO.
> AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions.
This is an interesting question to ask in these "how far away is AGI" discussions:
I was once at a conference where there was a panel full of famous AI luminaries, and most of the luminaries were nodding and agreeing with each other that of course AGI was very far off, except for two famous AI luminaries who stayed quiet and let others take the microphone.
I got up in Q&A and said, “Okay, you’ve all told us that progress won’t be all that fast. But let’s be more concrete and specific. I’d like to know what’s the least impressive accomplishment that you are very confident cannot be done in the next two years.”
There was a silence.
Eventually, two people on the panel ventured replies, spoken in a rather more tentative tone than they’d been using to pronounce that AGI was decades out. They named “A robot puts away the dishes from a dishwasher without breaking them”, and Winograd schemas. Specifically, “I feel quite confident that the Winograd schemas—where we recently had a result that was in the 50, 60% range—in the next two years, we will not get 80, 90% on that regardless of the techniques people use.”
I spent an hour of my life that I'll never get back reading Yudkowski's
overly-long article and I believe I can summarise it thusly:
"We don't know how AGI will arise; we don't know when; we don't know why; we
don't know anything at all about it and we won't know anything about it until
it's too late to do anything anyway; We must act now!!"
The question is- if we don't know anything about this unknowable threat, how can
we protect ourselves against it? In fact, since we're starting from 0
information, anything we do has equal chances of backfiring and bringing forth
AGI as it has of actually preventing it. Yudkowski is calling for random action,
without direction and without reason.
Besides, if Yudkowski is none the wiser about AGI than anyone else, then how is
he so sure that AGI _will_ happen, as he insists it will?
Yudkowski is fumbling around in the dark like everyone else in AI. Except he
(and a few others) has decided that it's a good strategy, under the
circumstances, to raise a hell of a racket. "It's dark!" he yells. "Beware of
the darkness!". Yeah OK, friend. It's dark- we can all tell. Why don't you pipe
down and let us find the damn light?
Sorry but I don't really see Yudkowski's contributions as "fundamental research into AI safety". More like navel-gazing without any practical implications. At best, listening to him is just a waste of time. At worse, AGI is a real imminent threat and having people like him generating useless noise like he does will make it harder for legitimate concerns to be heard, when the time comes.
Yes, I did and it's very bad form to go around asking people if they read the article. Try to remember that different people form different opinions from similar information.
Well, then you should have noticed what the article was about, which was not to detail a research program about AI safety.
Different articles can address different aspects of a problem without being accused of advocating "random action". That's just ridiculous.
>The question is- if we don't know anything about this unknowable threat, how can we protect ourselves against it? In fact, since we're starting from 0 information, anything we do has equal chances of backfiring and bringing forth AGI as it has of actually preventing it. Yudkowski is calling for random action, without direction and without reason.
Are you sure you read the essay? That's literally the question he answers.
At any rate, we do have more than '0 information', and if you make an honest effort to think of what to do you can likely come up with better than 'random actions' for helping (as many have).
>> Are you sure you read the essay? That's literally the question he answers.
My reading of the article is that he keeps calling for action without specifying what that action should be and trying to justify it by saying he can't know what AGI would look like (so he can't really say what we can do to prevent it).
>> if you make an honest effort to think of what to do you can likely come up with better than 'random actions' for helping (as many have).
Sure. If my research gets up one day and starts self-improving at exponential rates I'll make sure to reach for th
... yeah, before reading that link my position was "Wow, that's super neat, but Go is a pretty well-defined game," and after reading it I remembered that my position maybe a year or two ago was "Chess is a well-defined game that's beatable by AI techniques but Go is acknowledged to be much harder and require actual intelligence to play and won't be solved for a long while" and now I'm worried. Thanks for posting that.
Go is still a well defined game within a limited space that doesn't change, and rules that don't change. It's just harder than Chess, but that doesn't make it similar to tons of real world tasks humans are better at.
That's probably true, but that's very much not what people were saying about Go a couple years ago. There were a lot of people talking about how there isn't a straightforward evaluation function of the quality of a given state of the board, how things need to be planned in advance, how there's much more combinatorial explosion than in chess, etc., to the point where it's a qualitatively different game.
For me, as someone who accepted and believed these claims about Go being qualitatively different, realizing that no, it's not qualitatively different (or that maybe it is, but not in a way that impedes state-of-the-art AI research) is increasing my skepticism in other claims that board games in general are qualitatively different from other tasks that AIs might get good at.
(If you didn't buy into these claims, then I commend you on your reasoning skills, carry on.)
About those claims- this is from Russel and Norvig, 3d ed. (from 2003, so a way back):
Go is a deterministic game, but the large branching factor makes it challeging.
The key issues and early literature in computer Go are summarized by Boozy and
Cazenave (2001) and Muller (2002). Up to 1997 there were no competent Go
programs. Now the best programs play most of their moves at the master level;
the only problem is that over the course of a game they usually make at least
one serious blunder that allows a strong opponent to win. Whereas alpha—beta
search reigns in most games, many recent Go programs have adopted Monte Carlo
methods based on the UCT (upper confidence bounds on trees) scheme (Kocsis and
Szepesvari, 2006). The strongest Go program as of 2009 is Golly and Silver's
MoGo (Wang and Golly, 2007; Gelly and Silver, 2008). In August 2008, MoGo scored
a surprising win against top professional Myungwan Kim, albeit with MoGo
receiving a handicap of nine stones (about the equivalent of a queen handicap in
chess). Kim estimated MOGO's strength at 2-3 dan, the low end of advanced
amateur. For this match, MoGo was run on an 800-processor 15 terailop
supercomputer (1000 limes Deep Blue). A few weeks later, MoGo, with only a
five-stone handicap, won against a 6-dan professional. In the 9 x 9 form of Go,
MoGo is at approximately the 1-dan professional level. Rapid advances are likely
as experimentation continues with new forms of Monte Carlo search. The Computer
Go Newsletter, published by the Computer Go Association, describes current
developments.
There's no word about how Go is qualitatively different to other games, but maybe the referenced sources say something along those lines. Personally, I took a Masters course in AI two years ago, before AlphaGo and I remember one professor saying that the last holdout where humans can still beat computers in board games was GO, but I don't quite remember him saying anything about qualititative difference. Still, I can recall hearing about the idea that Go needs intuition or something like that, except I've no idea where I've heard that. I guess it might come from the popular press.
I guess this will sound a bit like the perenial excuse that "if it works, it's not AI" but my opinion about Go is that humans just weren't that good at it, after all. We may have thought that we have something special that makes us particularly good at Go, better than machines- but AlphaGo[Zero] has shown that, in the end, we just have no idea what it means to be really good at it (which, btw, is a damn good explanation of why it took us so long to make AI to beat us at it).
That, to my mind, is a much bigger and much more useful achievement than making a good AI game player. We can learn something from an insight into what we are capable of.
s/2003/2009/, I think, but the point stands. (Also I think I have the second edition at home and now I want to check what it says about Go.)
> my opinion about Go is that humans just weren't that good at it, after all. We may have thought that we have something special that makes us particularly good at Go, better than machines- but AlphaGo[Zero] has shown that, in the end, we just have no idea what it means to be really good at it (which, btw, is a damn good explanation of why it took us so long to make AI to beat us at it).
> the last holdout where humans can still beat computers in board games was GO
False, because nobody ever bothered to study modern boardgames rigorously.
Modern boardgames have small decision trees but very difficult evaluation functions. (Exactly opposite from computational games like Go.)
Modern boardgames can probably be solved by pure brute force calculation of all branches of the tree, but nobody knows if things like neural networks are any good for playing them.
In AI, "board games" generally means classical board games (nim, chess, backgammon, go etc) and "card games" means classical card games (bridge, poker, etc). Russel & Norvig also discuss some less well-known games, like kriegspiel (wargame) if memory serves, but those are all classical at least in the sense that they are, well, quite old.
I've seen some AI research in more modern board games actually. I've read a couple of papers discussing the use of Monte Carlo Tree Search to solve creature combat in Magic: the Gathering and my own degree and Master's dissertation were about M:tG (my Master's was in AI and my degree dissertation was an AI system also).
I don't know that much about modern board games, besides collectible card games, but for CCGs in particular, the game trees are not small. I once calculated the time complexity of traversing a full M:tG game tree as O(b^m * n^m) = 2.272461391808129337799800881135e+5564 (where b the branching factor, m the average number of moves in a game and n the number of possible deck permutations for a 60 card deck taking into account cards included multiple times). And mine was probably a very conservative estimate.
Also, to my knowledge, Neural nets have not been used for magic-playing AI (or any other CCG playing AI). What has been used is MCTS, on its own, without terrible success. The best AI I've seen incorporates some domain knowledge, in the form of card-specific strategies (how to play a given card).
There are some difficulties in using ANNs to make an M:tG AI. Primarily, the fact that a truly competent player should be able to pick up a card it's never seen before and play it correctly (or decide whether to include it in a deck, if the goal is to also address deck-building). For this, the AI player will need to have at least some understanding of M:tG's language (ability text). It is my understanding that other modern games have equal requirements to understand some game context outside of the main rules, which complicates the traditional tactic of generating all possible moves, pruning some and choosing the best.
In any case what I meant to say is that people in AI have indeed considered other games besides the classical ones- but when we talk about "games" in AI we do mean the classics.
> but when we talk about "games" in AI we do mean the classics
Only because of inertia. There's nothing inherently special about "classics". Eventually somebody will branch out once Go and poker are mined out of paper and article opportunity.
Once we do then maybe some new, interesting algorithms will be found.
In principle, every game can be solved by storing all possible game states in a database. Where brute-force storing is impractical due to size concerns, compression tricks have to be used.
E.g., Go is a simple game because at the end, every one of the fixed number of board spaces is either +1, -1 or 0. Add them up and you know if you won. This means that every move is either "correct" or "incorrect"; the problem of classifying multidimensional objects into two classes is a problem that we're pretty good at now, and things like neural networks get the job done.
A slightly more complex game like Agricola has no "correct" and "incorrect" moves because it's not zero-sum; you can make an "incorrect" move and still win as long as your opponent is forced to make a relatively more "incorrect" move.
Not sure how much of a difference that makes, but what's certain is that by (effectively) solving Go we've only scratched the surface. It's not the end of research, only the beginning.
Sure. Research in game playing AI doesn't end with Go, or any other game. We may see more research in modern board games, now that we're slowly running out of the classics.
I think you're underestimating the amount of work and determination it took to get to where we are today, though (I mean your comment about "inertia"). Classic board games have the advantage of a long history and of being well understood (the uncertainty about optimal strategies in Go notwithstanding). Additionally, for at least some of them like chess, there are rich databases of entire games that can be used outright, without the AI player having to generate-and-test them in the process of training or playing.
The same is not true for modern games. On the one hand, modern board games like Agricola (or, dunno, Settlers or Carcassonne etc) don't have such an extensive and multi-national following as the classics so it's much harder to find a lot of data to train on (which is obviously important for machine-learning AI players). I had that problem when considering an M:tG AI trained with machine learning: I would have liked to find play-by-play data on professional games but there just isn't any (or where there is it's not enough, or it's not in any standardised format).
Finally, classic board games have cultural significance that modern board games dont' quite match, despite the huge popularity of CCGs like M:tG or Pokemon, or Eurogame hits like Settlers. Go, chess and backgammon in particular have tremendous historical significance in their respective areas of the world- chess in Eastern Europe, backgammon in the Middle East, Go in SE Asia. People go to special academies to learn them, master players are widely recognised etc. You don't get that level of interest with modern board games- so there's less research interest for them, also.
People in game playing AI have been trying for a very long time to crack some games like Go and, recently, poker (not quite cracked yet). They didn't sit around twiddling their thumbs all those years, neither did they choose classical board games over modern ones just because they didn't have the imagination to think of the latter. In AI research, as in all research, you have to make progress before you can make more progress.
> Go is acknowledged to be much harder and require actual intelligence to play
No, Go is a much less intelligent[1] game. It has a huge decision tree and requires massive amounts of computation to play, but walking trees and counting is exactly what computers do well and what humans do poorly.
[1] 'Intelligence' here means exactly that which differentiates humans from calculators: the ability to infer new rules from old ones.
The smoke is when things like the same simulated robot that learned to run around like a mentally challenged person also learns to simulate throwing and can read very basic language.
It will seem quite stupid and inept at first. So people will dismiss it. But when they have a system with general inputs and outputs that can acquire multiple different skills, that will be an AGI, and we can grow it's skills and knowledge passed human level.
The hockey stick is lying horizontally though instead of vertically. If it took 3 days to go from 0 to beating the top player in the world, I wouldn't have expected it to take 21 days to beat next version. I guess something happens at the top levels of Go that make training much harder.
On another note, I didn't look at the details closely but it seems AlphaGo Zero needed much less compute training time than Alpha Go Master. Could getting rid of any human inputs really make it that much more efficient? That implies it will be able to have an impact in many different areas, which is a bit scary...
(Updated - it took 3 days to beat the top player in the world.)
This type of curve is what I would expect out of machine learning. At first there is rapid improvement as it learns the easy lessons. The rate then slows down as further incremental improvements become less impact.
What is, perhaps, surprising is that human play happens to be relatively close to the asymptote. Although this could be explained by Alphago being the first system to beat humans. If its peek performance were orders of magnitude higher than humans, a weaker program would have already beaten us.
The horizontal hockey stick makes sense to me in terms of learning. Each increased layer of of understanding a complex system could mean a potentially exponentially increasing difficulty.
I'm sure it's naive to jump to sci-fi conclusions just yet, but I admit it's equal parts fascinating and terrifying. The general message of the posts is that human knowledge is cute but not required to find new insights. Define the measure of success and momma AI will find the answer. At this point, the path to AGI is about who first defines its goals right and that seems... doable? Even scarier: We think the holy grail of AI is simulating a human being. The AI of the future might chuckle at that notion.
Wait for Alpha StarCraft for some real panic. So far RL based method has limited success outside of simple games(Not to say Go is simple, but rather the presentation and control parts of the format).
I'd like to see a StarCraft player AI that wins using a mere 1/10th of the effective actions per minute (EPM) of world class players.
To me it seems beating another player while using fewer actions indicates superior skill, understanding and/or intellect.
Not sure I agree with this fully. Certainly many actions used in a typical SC game are redundant, but there are reasons for it. Lag for one. If there's a possibility of lag or dropped packets, spamming a command will help nullify this problem.
The other is the entire reason for high APM, the stop/start problem. Pro players keep high APM so that when they actually need high EPM their muscle memory is already at full tilt. If you slow down your APM during lulls in the action it becomes harder to suddenly increase it when a fight happens.
Certainly that's an entirely human condition that a machine wouldn't need to worry about. But I'm not sure it means lack of skill.
You expressed my exact thoughts and I was about to link to the same insightful article. I guess my comment could've been shortened as a silent upvote, but I commented anyway.
Games are a joke compare with real life. The number of variables and rules is well defined in games, while in real life it is not. That is why AGI is not coming anytime soon.
> Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play.
So technically this version has lost every game it's ever won.
Temporal difference learning was previously consider weak at 'tactical' games, ie ones with gamestates that require long chains of precise moves to improve position (like many checkmate scenarios in chess) .
For anyone more familiar with this technique, is it clear how the mcts/checkpoint system overcomes this? How sensative is the system to the tuning params for those parts of the alg. Like is Go a particularly good candidate because of the ~400 play positions resulting in a (relatively) small tree seach requirement? (I kinda cant believe im saying that go has 'a small search tree'!)
We us td learning for the ai in our game Race for the Galaxy, so it's neat to hear about possible avenues for improvement!
After digging a bit deeper into the paper, it seems a key part of the new scheme is the NN is trained to help guide a deep/sparse tree search (as opposed to TD-gammons fully exhaustive 2-ply search). It's somewhat surprising to me that the simple win/loss is a strong enough signal to train this very 'intermediate step' in the algorithm - a spectacular result! It begs the question what other heuristic based algorithms would be improved by replacing a hand rolled non-optimal heuristic function with a NN?
It's estimating the probability of winning from the position based on what it has already seen. So basically it's a giant conditional probability distribution. Is it mistaken to interpret this as a bayesian network?
Wow, that was a really deep and enjoyable Wikipedia rabbit hole journey. I hadn't heard of Temporal Difference before (though I was familiar with Q-learning).
It was interesting to note that TD-Gammon improved with expert designed features. I wonder if this was simply related to the technology of the field as it stood over 20 years ago or some underlying categorization or complexity associated with the games themselves (backgammon being more favorable to human comprehension than Go in this case).
> Even though TD-Gammon discovered insightful features on its own, Tesauro wondered if its play could be improved by using hand-designed features like Neurogammon's. Indeed, the self-training TD-Gammon with expert-designed features soon surpassed all previous computer backgammon programs. It stopped improving after about 1,500,000 games (self-play) using 80 hidden units.
For others: Richard Sutton, one of the pioneers of TD makes his Reinforcement Learning: An Introduction textbook available for free on his website: http://incompleteideas.net/sutton/ (MIT Press also links to it)
Yeah it's not clear to me why temporal difference learning all of a sudden works so well here? Is it the case that nobody had really tried it for learning a policy for Go with a strong NN architecture? In the Methods they mention TD learning for value functions but I don't see anything about policies.
edit: OK, they're calling it policy iteration as opposed to TD learning. I guess I don't get the difference.
TD learning is, in some sense, a component of policy iteration. TD learning is about learning the value function for a given policy. In policy iteration you use a value function to decide how to update the policy for which the value function was estimated, and you iterate between the "learn value" and "update policy" steps.
It's my opinion that TD Gammon was solved in the 1990s because backgammon is a 1 dimensional board. It didn't need the convolutional techniques of the Go neural nets to gain insight into the game and could thus be solved by a traditional neural net.
I meant that tongue-in-cheek based on the "by playing games against itself" during training. Nonetheless, thanks for clarifying that in case it's unclear for others (and for the SGFs).
I remember reading about Blondie24, a program that learned to play checkers at a high level without human input. It was based on neural network and genetic algorithm technology. From the Wikipedia entry: "The significance of the Blondie24 program is that its ability to play checkers did not rely on any human expertise of the game. Rather, it came solely from the total points earned by each player and the evolutionary process itself." [1].
In addition to numerous journal articles, the creators wrote a lay-person book on their creation: Blondie24: playing at the edge of AI, by David B. Fogel [2].
"It uses one neural network rather than two."
and
"AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous versions of AlphaGo included a small number of hand-engineered features."
This is amazing! The technology they came up with must be super generic.
Also, unsupervised. Also, no rollouts. They got rid of a lot of complexity. At this point it looks like a reasonable challenge to write a superhuman Go AI in 500 lines of unobfuscated python.
I was wondering about this: can we study AlphaGo Zero and other nets created in the same way for similarities, extract and study them? Or are we limited to observing the behavior and learning from that?
I will be interested to see what kind of algorithms they have used to allow AlphaGo to learn from its own moves. Are these pretty generics algos or are these very customized and specific ones that only apply to AlphaGo and the game of Go?
They have a new reinforcement learning algorithm that should be generically applicable to anything where a long sequence of moves results in a specifically gradable outcome.
> The neural network in AlphaGo Zero is trained from games of selfplay
by a novel reinforcement learning algorithm. In each position s,
an MCTS search is executed, guided by the neural network fθ. The
MCTS search outputs probabilities π of playing each move. These
search probabilities usually select much stronger moves than the raw
move probabilities p of the neural network fθ(s); MCTS may therefore
be viewed as a powerful policy improvement operator. Self-play
with search—using the improved MCTS-based policy to select each
move, then using the game winner z as a sample of the value—may
be viewed as a powerful policy evaluation operator. The main idea of
our reinforcement learning algorithm is to use these search operators
repeatedly in a policy iteration procedure: the neural network’s
parameters are updated to make the move probabilities and value (p,
v)= fθ(s) more closely match the improved search probabilities and selfplay
winner (π, z); these new parameters are used in the next iteration
of self-play to make the search even stronger.
> They have a new reinforcement learning algorithm that should be generically applicable to anything where a long sequence of moves results in a specifically gradable outcome.
Statements like these always make me wonder why certain obvious things weren't tried. If it's so generic, why wasn't it tried on Chess? Or was it tried, failed to impress and thus didn't make it into the press release?
This is a big problem with all these public discussion on AI. Almost no one speaks about algorithm failures. I haven't seen a single research paper that said "oh, and we also tried algorithm in X domain and it totally sucked".
The conventional wisdom for Chess engines is that aggressive pruning doesn't work well. Chess is much more tactical than Go, selective algorithms tend to lead to some crucial tactic being missed, and the greater the search depth, the more likely that is.
Modern Chess engines are designed to brute-force the search tree as efficiently as possible. I will go out on a limb here and say they would wipe the floor with AlphaGo, because AlphaGo's hardware would be more of a liability than an asset against a CPU.
Until I see AlphaGo zero defeating StockFish 100-0 and with same algorithm defeating best Go AI and killing the Atari games including montezuma’s revenge, I call this hype bullshit.
Give me your results on OpenAI gym in a variety of different styles of games including GTA and WoW. I will believe you if a generic unsupervised algorithm running on a single machine is absolutely destroying the best players.
Just like Lee Se-dol is a Go grandmaster, beats Gary Kasparov at chess and can also get a perfect score in Pac-Man, right? I mean, if you can't do all of those things then are you even a human-level intelligence?
This just illustrates that surpassing "human level" performance is a silly and arbitrary benchmark, because there is no such thing as general human level performance. But I bet Kasparov would be pretty good at Go, and Sedol would be pretty good at chess.
Universality is the real hard problem of AI. In the long run, a mediocre AI that does a lot of different things is far more useful that most targeted "superhuman" AIs. Most domains simply don't require better-than-human performance, but could still reap tremendous benefits from automation.
Agreed. It's great that we have domain-specific approaches that can beat humans in their domain (and that we're learning how to make these approaches more generic so that, with re-training, they can adapt to new domains), but the real "oh snap" moment will be when we build something that's barely-adequate but widely adaptable. Something with the adaptability of a corvid or an octopus, say. If we get to that level, it'll mean we've discovered the "universal glue" that joins specialist networks together into a conscious entity.
You forget to add "running on 20 watts of power". It's not reasonable to require it to run on a single machine, when brain performance is estimated to be more than 10 petaflops.
Doesn't this sound very much like how a human learns to play the game? MCTS ~ play/experience (move probabilities); self-play with search ~ study/analysis (move evaluation); repetition and iteration to build intuition (NN parameters).
From the paper: "it uses a single neural network, rather
than separate policy and value networks ... it uses a simpler tree search that relies upon this
single neural network to evaluate positions and sample moves, without performing any MonteCarlo
rollouts. To achieve these results, we introduce a new reinforcement learning algorithm that
incorporates lookahead search inside the training loop, resulting in rapid improvement and precise
and stable learning."
It also presumes that one can simulate the world at low cost. In AlphaGo Zero it takes 0.4 s for 1.600 node extensions, but in this case the cost of the world is negligible. Anyway, assuming you need that many node extensions to get decent quality updates, that puts a rather a tight limit on the cost of simulating the world.
DM has already done a bunch of work on 'deep models' of environments to plan over. Use them and you have 'model-predictive control' and planning, and this tree extension to policy gradients would work as well (probably). It could be pretty interesting to see what would happen if you tried that sort of hybrid on ALE.
I guess deep world models are still severely riddled by all sorts of problems: vanishing gradients, BPTT being O(T), poor generalization ability of NNs (which likely is due to the lack of attractor state associative recall, as well as concept composability), lack of probabilistic message passing to deal with uncertainty, and perhaps some priors about the world are necessary to make learning tractable (such as spatial maps and fine-tuning for time scales that contain interesting information).
I'm wondering if once one of these algorithms comes along that has been perfected if it is going to "burn in" the domain it was built for as the target of problem reductions, similar to 8086 assembly or the qwerty keyboard living on today despite them being ancient relics.
For example, after this result it seems if you can reduce your problem domain onto Go (or a similarly structured game) you now have a way to create a superhuman solver. It may just be easier to do that then try to even figure out how to design and tune a new network.
I could imagine waking up in 10 years being confused at why all software efforts in the AI space are focused on just figuring out clever ways to map real problems onto a hodgepodge of seemingly random "toy" domains like Go and Chess and Starcraft. Hell, maybe the Starcraft bot will immortalize Starcraft in a way the game never would have been able to if it becomes a good reduction target for a lot of domains.
It kind of reminds me of how SVMs were "abused" by twisting non-linear domains into them via kernel methods, or by proving the NP-equivalence of a problem by reducing it onto 3-SAT, or how ImageNet's weights are being re-purposed for other image oriented prediction tasks.
In many domains mapping the problem to a tree search already gives you a superhuman solver or at least a passable solver. Problem mapping is what most of modern AI research is about. That's how the field was redefined in recent years. Just like Vladimir Vapnik says[1], it's becoming more engineering than science. (And sometimes more software alchemy than engineering.)
Looks like the performance improvement comes from two key ingredients:
1) Using Residual networks instead of normal convolutional layers
2) Using a smarter policy training loss that uses the full information from a MCTS at each move. In the previous version, I believe they just ran the policy network to the end of the game and used a very weak {0, 1} reinforcement signal over all of the moves played. Here, it looks like they use each run of MCTS to provide a fully supervised signal over all moves it explores.
How is it different to apply the loss on each actual move at the end of the game VS on each rollout (which is itself a tiny game)?
Does it help reinforce learning towards the end game as shorter rollouts are needed? Is the more accurate information then propagated to earlier moves as well?
I think the difference is that under 1/0 policy gradient loss, it gets feedback only on the actual chosen move. Under MCTS-rollouts-each-move, it gets feedback on every move on the board whether its value estimate was slightly too high or low plus the ultimate outcome of the 1 move it did make.
"AlphaGo Zero is the program described in this paper. It learns from self-play reinforcement
learning, starting from random initial weights, without using rollouts, with no human supervision,
and using only the raw board history as input features. It uses just a single machine
in the Google Cloud with 4 TPUs (AlphaGo Zero could also be distributed but we chose to
use the simplest possible search algorithm)."
I remember reading ages ago in Scientific American about a much more interesting (and useful) AI application of this technique.
Genetic algorithms were used to evolve new, more efficient variants of existing electronic circuits. I dug it up - it was: https://www.scientificamerican.com/magazine/sa/2003/02-01/#a...
Article "Evolving inventions". I have no idea if there is an open-access version anywhere.
As far as I remember, that approach led to some patents, because some of the inventions were better than existing solutions. One of the examples in the article was a low-pass filter (I dont remember if AI version was actually better or worse than human-made).
The essential element of this approach was that in electronics (as in go) there exist a well defined set of rules, that allows researchers to build a simulation engine with optimization/evaluation function that the AI targets by itself, without supervision. It's great to see that this approach is still alive, although in my humble opinion, application in electronics is much more interesting than Go.
Very impressive, the original implementation relied a lot on feature engineering.
I'm surprised they're able to prevent a self-play equilibrium with such a simple loss function.
It's sort of like they are using auxiliary outputs but instead of using them to fit features, they are fitting to multiple ways of arriving at 'best play', through predicting value (SL) and predicting probability for best outcome (RL). In principle, they're doing the same thing but in practice it seems like they are making up for each others shortcomings (e.g. self-play equilibrium with RL).
> If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials,
Protein folding sounds like a nice idea for their next challenge.
When things will start getting interesting is when we figure out how to get move simulation and search into the network itself, rather than programming that on the outside. As far as I know, no-one has even the faintest idea of how to do that. We have an existence proof that this should be possible.
The networks are great at perception and snap-prediction. Anything a human can do in 200ms is fair game. And with clever engineering, we can make magic happen by iterating or integrating those things.
But it's after that first 200ms that humans get really intelligent. When we can come up with an architecture that lets the networks themselves start simulating possibilities, backtracking, deciding when to answer now or to think more -- when the network owns the loop -- then it will get interesting.
> We have an existence proof that this should be possible.
Not guaranteed. The human brain has diffusion signalling (i.e. neurotransmitters passing out of the synaptic cleft, into a neighbouring one, and activating a receptor on some other spacially-local axon as a result.) And one of those signalling molecules is thought to represent, in its intensity, a confidence-interval bias adjustment (i.e. a pruning bias factor for MCTS.) So the brain's MCTS-equivalent process may rely on some extra-graphical properties of the brain-as-embodied-meat-thing.
“Neighbouring” is defined in terms of embedding in a metric space and inverse-cube diffusion, rather than anything to do with graphic connectivity.
Also, these signals pile up in the synaptic cleft until they’re picked up, so it’s not just about instantaneous transmissivity as if these were radio signals.
But also also, other stuff like monoamine oxidase is floating about in its own diffusion patterns, cleaning up these signals.
It’s basically like a “scent” communication embodied-actor model, but a very complex one where things like redox reactions with the atmosphere occur.
Oh, and there are “secondary messengers”: signals that trigger other signals that, among other things, inhibit the release of the original signal when received back at the sender, such that an dynamic equilibrium state is reached between the two signal types.
It's very interesting to see if it is able to handle much more advanced and tuned engines that exist for chess, game with considerable much more complicated rules?
It'd be particularly useful to have a chess bot that can play badly in the same way a human does.
The problem with the current chess bots is that they play badly, badly. They choose a terrible random mistake to make every few moves, while some of their other moves are brilliant. They cannot accurately mimic beginner or intermediate level players.
This seems like something DeepMind could create, given the incentive. They were able to train AlphaGo to predict human moves in Go at a very high accuracy (obviously not with AlphaGo Zero, but the inferior human-predictive version is how they determined that AGZ is playing qualitatively differently).
I have some idea how it MIGHT work, but it would be a very boring solution involving 'learning' Stockfish's parameters and HOPING to find improvements to something like integrating time management and search/pruning into it.
I wouldn't bet on it though. SMP is notoriously hard to work with alpha-beta search and there are a lot of clever tricks (which is probably still not perfect). Maybe with ASICs, you could make it stronger, but then it wouldn't be as fair a comparison.
Well, all top engines did some kind of search on parameters, not sure if you can find much improvement there.
I'm talking about something similar to the described in the paper, 100% self-learned solution without using human heuristics, based on NNs. That could bring a totally new ideas into chess.
Shogi is probably the closest historical game in terms of complexity to Go. Some of the larger variants might exceed Go's complexity if played with drops, though that's not normally done. And Go played on a 9x9 board (like standard Shogi) has a substantially lower state space complexity (and almost certainly lower by other measures as well.)
But shogi is much more obscure outside of Japan than go or chess, so it gets less interest, especially in the large-board variants.
I think that the existence of highly optimized chess AI makes it interesting from two angles:
1) Generalization: Can one make AI using same approach that can play both chess and Go at superhuman levels
2) Efficiency: Can these newer methods match or outperform also in terms of compute/energy costs
But maybe not sexy enough, or we just don't hear about it as much.
That makes it even more interesting. I think it would be very notable and significant if a neural network with MCTS and self-play reinforcement learning could surpass Stockfish, which has superhuman strength but was developed with an utterly different approach involving lots of human guidance and grandmaster input.
Giraffe attempted this (with more standard tree search than MCTS and with only a value function rather than a combined policy/value network), but only reached IM level -- certainly impressive, but nowhere close to Stockfish.
Denis Hassabis was asked this in a Q&A after a talk he gave and according to him someone did this (bootstrap a chess engine from self play) successfully, while still being a student and was hired by them subsequently.
I didn't see the talk, but I'm guessing he was referring to the Giraffe engine done by Matthew Lai (https://arxiv.org/abs/1509.01549). The main thing there is that he only learns an evaluation function, not a policy. Giraffe still uses classical alpha-beta search over the full action space. AFAIK nobody has learned a decent policy network for chess, probably because 1) it's super tactical, and 2) nobody cares that much because alpha-beta is so strong
Minimax with Alpha Beta pruning works in Chess because the search tree is way smaller. The reason why all this "Monte-Carlo Tree Search + Neural Nets" are being used in Go because Minimax + Alpha Beta pruning DOESN'T work in Go.
That's still 10 times as much energy as a human body or 100 times as much as a human brain. But yeah, it's not like they're throwing a datacenter at this.
Is AlphaGo Zero the first Go program without special code to read ladders? I'm curious how a pure neural net can read them, given how non-local they are.
The concept of locality is nothing but a human weakness in Go, the best AI must read the whole board with every move.
EDIT: From the paper: "Surprisingly,
shicho (“ladder” capture sequences that may span the whole board) – one of the first elements of
Go knowledge learned by humans – were only understood by AlphaGo Zero much later in training" I'm surprised by the author's use of the word "Surprisingly" here.
AlphaGo is still based around layers of 3×3 local convolutions.
That represents a strong assumption about locality in the network design. I would expect AlphaGo to perform poorly on the game "Go with the vertices randomly permuted".
>Surprisingly, shicho (‘ladder’ capture sequences that may span the whole board)—one of the first elements of Go knowledge learned by humans—were only understood by AlphaGo Zero much later in training. [0]
The catch is that this isn't quite zero human knowledge, since the tree search algorithm is a human discovery, and not one that came easily to humans. It also massively cuts down on the search space for an appropriate policy function.
That means that this setup isn't necessarily general. How applicable is MCTS to games with asymmetric information, a la Starcraft? What about games that can't quite be modeled with an alternating turn-based game tree like bughouse?
There's a Dota 2 bot by OpenAI that played games with itself and managed to beat a lot of pros in the scene. It's still SF mid only no runes and some restricted items, but it shows that there is also potential for Starcraft.
That's not quite what they're talking about WRT zero human knowledge.
The problem is that there's no intrinsic scoring system for Go, nothing specific to maximize, so it's difficult to tell a computer whether a given outcome is "good" or "bad". So early versions of AlphaGo used a collection of human-played Go games to get an idea of what constitutes "good" and what is "bad", so it can then train its model to predict whether a move will make things better or worse.
This new system forgoes that step, and instead has the model play itself starting at random and looking for patterns that end up winning games. It's as if you gave the rules to the game of Go to a culture that's never heard of it before, and they evolved their own play style entirely in isolation.
Their result is a model that is better than the one that was developed with human influence, and that's the interesting bit.
I understand that the paper means that they didn't train it on expert input. The significance of the research is that this is a more general way to construct a game AI. The question I am posing is how far we have to go on that front.
Yes. It'll be interesting to see if their starcraft project uses the same algorithms or not. Note that the link merely describes software that could be used for feature engineering. It doesn't describe what NN architecture or tree search algorithms deep mind is using.
> What about games that can't quite be modeled with an alternating turn-based game tree like bughouse?
Train a network which predicts future state of the game, given current state and input. Train a network which generates sensible inputs, given current state. Use MCTS.
Bughouse, starcraft, and other important games need to be modeled as simultaneous-decision games. Plain-vanilla MCTS is designed for alternating-decision games.
To see why this is important, consider why min-max (which MCTS approximates) actually works. At any given point, the equilibrium strategy for the player to move is the move that maximizes their payoff, and the utility for each move can be found recursively.
In simultaneous decision games, calculating the equilibrium strategy (which may even be a mixed strategy) is more complicated. See http://mlanctot.info/files/papers/cig14-smmctsggp.pdf for various ways in which MCTS can be extended to simultaneous-decision games.
It'll be interesting to see if DeepMind picks up a search algorithm someone else has researched, or if they come up with something entirely new.
And he would have still said that deep learning lacks any sort of common sense understanding that's necessary to get close to human level intelligence.
I'd certainly be surprised if I ever woke up from being cryopreserved. Which isn't to say that I'd object to the process if I had the disposable income and an understanding/cooperative family support structure, which I do not.
I wonder if it would be more popular if the cost was reduced to something similar to a regular funeral. It seems it might be a more cheery send off even if the chances of it working are questionable.
You can already buy an insurance plan that will pay for it in some states, and that's reasonably priced. In my case religious family members would never let it go down though even if the finances were solved.
One idea occurs to me is to now evolve the Go game itself in a direction that adds more challenges for an AI to solve, and then solve those problems. How about being able to handle different and randomized board shapes? How about being allowed to say one move the opponent cannot take when you play a piece? It would be interesting to keep track of what variations the algorithm handles well automatically, and which it falls flat on, etc.
Like Arimaa, some other games were (at least partially) designed to be hard for computers: Havannah [1] and Octi [2]. Havannah has since been defeated by the machines. Octi remains unchallenged, but that is probably due to its obscurity.
This is such an impressive result, and so general, I bet many people (including me) wish they knew exactly how to duplicate this result. It would be great if they created an online course that explained all algorithms in detail right up to the creation of AlphaGo Zero itself. The paper gives the impression that it shouldn't be too hard for them to create such a course.
General AI relative to an individual human, or billions of humans? The sum total of human beings, or organizations of humans is superhuman relative to an individual. We've had superhuman organizations for millennia. I'm not sure how much general AI will be different, other than the large scale automation of jobs which would happen.
As Rodney Brooks pointed out, all technology happens within a context, not a vacuum. A general AI will come to exist in a world with a lot of other superhuman capabilities already in existence.
One of the more interesting things the success of "starting with zero" suggests is that the idea that some mystical "human consciousness" is the end goal for AI might be laughable in the long term. AI might just casually bypass human consciousness, say "oh, hi!" and wave us goodbye a day later. Also, a factor of 7 billion "happens" in computer science.
This is getting rather creepy to think of, even if it's still science fiction. At this point, I could see a computer that out-thinks humanity within decades. What would it think? What would we even do with its findings? Would we understand it? Would it understand itself? Would it know how to manipulate us?
If you multiply X by the amount of time it takes, on average, for a human to make a move... How many human lifetimes did Zero take to get to superhuman?
Amazing results, though I am somewhat frightened by how generic this model is and how it achieved such amazing results. I can't help but think that these same techniques can be used to learn how humans react in certain situaties and how they can be, very subtely, be worked to think in a certain way - one that fits the agenda of whatever party is behind it.
With the mass surveillance that is Google it's quite doable to test for human reactions on certain things. They got the tools to execute a certain plan and evaluate the effectiveness. Ofcourse it can also go in a benelovent way: like what kind of policy will benefit the most people? (semantics of 'benefiting' aside)
I atleast certainly hope these kinds of generic algorithms will be used to generate effective, meaningful policies that truly help the people. Still a far away future but one that gets closer by the day.
I'd only worry if it can outperform humans when there are not rules per se. That is, if I put a queen down in the GO board, and start knocking off stones, moving three times a turn, then take a lighter and burn the go board, the AI responds by decapitating my head.
Ha! I do wonder about using a board game where the rules periodically change in simple ways at random. A human could easily adapt to the rule changes while playing and adjust their strategy accordingly. Would a Deep Learning algorithm be able to do this?
If we keep the board and pieces digital, then the board could change shape, the pieces could change color indicating a random association with a rule change, and what not.
What's fascinating (and admittedly somewhat worrying) about Self-play is that an agent can accidentally become adept at tasks other than intended via transfer learning. The "wrestling spiders" in OpenAI's demo quickly mastered the art of Sumo Wrestling. And whatever skills they learned in resisting an opposing force to stay standing on a platform, were immediately applicable to myriad different domains. In this case, being subject to hurricane force winds, and not as any normal spider may, be hurled into the sky!
It's more difficult to see how Go playing skills can translate to other domains. But for tasks in robotics, cybersecurity or fintech the power of self-play trained transfer learning becomes more apparent.
It is clear that these "self-play" scenarios depend on simulation - unless there is an appropriate stage for self play to take place on, there can be no play. The question is - how do we stand with simulation for robotics, self driving cars, etc.
My bet is that simulation is going to be the crowning jewel in the AI field, replacing static datasets and supervised learning with "dynamic datasets" and rewards. It would help with data sparsity as well (where can you find an image of a donkey riding an elephant for the new ImageNet? - but you can sim that or any possible combination).
Not to mention that humans are fallen head over heels with simulation as well - VR headsets and games in general. I see a great future for simulation with both AI and humans. It will be our common learning/playing/research sandbox.
Would be nice if there was an open source attempt at an alpha go clone on a 9x9 board, so it could be run on commodity hardware and maybe trained in more reasonable time. Also would be interesting to see if human would still win on a 190x190 or some arbitrary size board against alphagozero trained appropriately.
Is this evidence of a broader leap forward in machine learning, or are these advancements domain-specific? In other words, could these innovations be applied to other fields and applications?
I haven't read the paper yet but "AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous versions of AlphaGo included a small number of hand-engineered features."
I think the fact that it's no longer using Monte Carlo tree search is a huge step forward in the generalizability of the technique. But go is still
- a perfect information game
- with a relatively small input size (vs. arbitrary computer vision)
- cheap to simulate
- discrete action space
- deterministic
This isn't to take away from the magnitude of the achievement, but the nature of the problem itself makes the result less applicable to many tasks we might want to use RL for.
Math research shares those qualities, except small input size if you include the body of all the already-known theorems as an input. I don't know if we'll see much smarter proof assistants soon, but it doesn't seem absurd to me as a possible development.
Actually, just converting all the already-known theorems into a form that can be computationally verified (not just convince a skilled human) would be an interesting starting point. This would really help the metamath project, and perhaps make peer review of mathematical research papers easier:
It still uses MCTS as its search algorithm. It no longer uses random rollouts as part of the evaluation, though. (Previously it was rollouts/2 + value_network/2)
Random rollouts are what the MC in MCTS stands for.
Without that, it is simply a tree search.
Excerpt from the paper:
> [AlphaGo Zero] uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte-Carlo rollouts.
Does this mean it learns what to search? I wonder why they thought it was a good idea. I thought the whole point of MC was that pruning algorithms like the ones in chess wouldn't work for a larger search space.
The policy network is a function from board states to a scoring of moves. The policy network with the greedy heuristic, ie pick the highest rated move with no explicit look ahead method, plays at a high amateur level.
This was... unexpectedly good.
It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering.
Extended data figure 2 contains something really cool: how long it takes for AG0 to discover a joseki, and how long it takes it to later discard it as non-jokeki. So you could, in theory, evaluate a human joseki by plotting AG0's probability of playing it against training time (or against its Elo rating). What's also cool is that the differences that cause it stop playing a joseki must be really minute, but it can see them anyway.
I think that we cannot full understand the implications of that in our present/near future. The use of tecnhology will make the difference. There is a small book in amazon called something like Alpha Go Zero - 10 prophecies for the world. Interesting point of view of the use of it.
> The system starts off with a neural network that knows nothing about the game of Go. It then plays games against itself
I might have missed this, but: Where are the actual rules of Go encoded? Mustn't there be some enumeration of what constitutes "capturing," how the win condition of the game is calculated, correct?
If you give a list of "possible next states" from any given state, that should encompass all the rules. If there's a rule that's preventing you from doing something, it's not a possible state.
By nothing they mean hints about what constitutes "good Go strategy". But it implicitly knows all of the rules of go.
Surely if they are really starting with "zero", then all the AI is given is the arrangement of stones on the board (which starts empty) with the opportunity to select a position for its next stone after its opponent has placed one, until the game is over. (Let's assume that there is another piece of software responsible for determining when the game has finished, and who has won). As such, the only "rules" the AI needs are that it can only place one stone at a time, only in an empty position, and only when it is not the opponent's turn.
To start with "less than zero", though, it would be interesting to see them give the AI a 3D simulation of a room with a simulated Go board and a simulated stone, and give the AI a fixed amount of time for it to have its turn. Just by using the pixel data from a simulated camera, it could learn to use a simulated arm to place the simulated stone on the board in a legal position. The reward function would just have to say, at the end of each allotted time period, whether a legal move had been made or not, and the AI could bootstrap up from that.
As is, no. There are too many possible actions and too little time between decisions.
I wouldn't discount it entirely though, some sort of clustering of actions may be able to reduce continuous action spaces to a manageable branching factor.
On the bright side, it means the several thousand years of humans playing Go, we were actually going "in the right direction" in terms of optimal strategy, despite not having reduced the game down into provably-optimal mathematical theorems.
Is there anywhere to see the games? I'm curious if the AI is superior to humans or just human trained AI. It'd also be interesting to see the source, but that is apparently not being released for some reason.
Theoretically, yes. But anyone who has tried to implement scientific papers will tell you it's far harder. The papers often lack critical details and implementation hacks: all the little rough edges that go into making a production system work. They also lack context in many cases, so you spend more time reverse engineering the paper than figuring out how to make it work.
I would love to know how this adversarial training doesn't end up overfitting. Or, put another way, I'd love to see another piece of software (or even a human being?) exploit Alpha Go's overfitted strategy.
Deep reinforcement learning is interesting and has plenty of potential. But highlighting AlphaGo as an example of reinforcement learning is like undermining the concepts of reinforcement learning.
this is quite an achievement that it plays accurately. Does this have implementations for Strategy? No two games are alike, yet we are able to learn from our experience. Human's players be somehow comparing local positions from various games and deciding (sometimes wrongly) that they can be played similarly. Can it recommend not just individual positions, but general modes of play for classes of positions?
Serious question: could it be possible, even in theory, to translate a neural network to a high level programing language? So we can see what it's doing?
How do they evaluate that AlphaGo Zero is better than the previous AlphaGos? By playing them against each other? Or playing AlphaGo Zero against humans?
But what if the other AlphaGos are blind to some tactic that humans could take advantage of? Not saying this is likely given how the Lee Sedol match and others went but I'm curious how they come up with the rankings.
At the end of the paper they describe how the come up with ELO rankings, and that in order to avoid some bias due to self-play only they include the results of the AlphaGo's versus Fan, Sedol, etc.
I love what DeepMind is doing, but find their publication choices bizarre. Sure Nature has the occasional ML / NN paper, but it's not a top journal for AI / ML / CS and it makes getting hold of papers awkward and it doesn't seem in the spirit of most CS research.
I also wonder how much it’s style of play changes if it were re trained, due to the random start that it is given. Maybe that would produce something like seeds for procedurally generated worlds in games. Like if they could find a seed for a Chinese or japanese players, or ones that more aggressive styles. This is some pretty cool work and may open up even more doors for pure reinforcement learning