Superhuman AI in heads-up no-limit poker: Libratus beats top professionals [pdf]

antognini · on Dec 21, 2017

One of the researchers on this team gave a great talk at NIPS this year about this work and explained how playing an imperfect information game like poker is fundamentally different from playing a perfect information game like chess or go. A perfect information game has no path dependence --- once you arrive at a state, it doesn't matter how you got there. You only need to consider the current state of the game to figure out the best move to play.

In an imperfect information game this is no longer true. Now you need to consider counterfactuals --- how would the opponent have played if I had done something different earlier?

They illustrated with a toy game they called "Coin Toss". The idea is that Player A flips a coin, but Player B doesn't get to see the result. Player A then gets to choose whether to end the game there or continue playing. If they end the game, they get $1 from Player B if the coin is heads and lose $1 from Player B if the coin is tails. If they decide to continue playing, then Player B gets to guess whether the coin is heads or tails. If Player B guesses correctly, they get $2 from Player A, and otherwise lose $2.

If you spend some time thinking about the strategy, you can figure out that the best strategy for Player B is to guess heads 25% of the time and tails 75% of the time. But if you change the rules to the game for Player A, so that they get $1 if it's tails and lose $1 if it's heads, then the best strategy for Player B reverses as well --- now Player B should guess heads 75% of the time and tails 25% of the time. Yet nothing about Player B's situation has changed --- the only change happened prior to Player B's involvement. Yet this nevertheless changes Player B's strategy because Player B must consider how Player A would have acted given that the coin landed heads vs. tails.

So to solve these imperfect information games, you have to be very careful to model these sorts of counterfactuals. If you mess up you will leave yourself with an exploitable strategy --- that is, someone can devise a strategy that causes you to lose the game.

Link to the paper here:

https://arxiv.org/pdf/1705.02955.pdf

roenxi · on Dec 21, 2017

I'm going to read the paper carefully and think hard, but going in to it I still have philosophical issues saying imperfect information is 'fundamentally' different.

Sure, the specific learning algorithm may not generalise from board games to poker. But to claim that Go is perfect-information is more theoretical than practical.

In practice, the 50% of moves your opponent makes are unknowns. Nobody can assess the entire game tree, so the fact that it is theoretically knowable isn't actually relevant in a game. Neither computers nor humans can compute it. Once the tree can be reasonably computed, between high level players the game is over (bar the occasional blunder).

jpapon · on Dec 21, 2017

>> Nobody can assess the entire game tree, so the fact that it is theoretically knowable isn't actually relevant in a game.

That's not the point though. The point is that you only need to consider current state when considering your next move in Go or Chess. This is not the case in an incomplete information game, since the sequence of moves that led to the current state contains a lot of information. That's not the case in Chess or Go. Basically, Chess and Go satisfy the Markov Property, while Poker does not.

roenxi · on Dec 22, 2017

The current state in Go is the state of the board, the current state of poker includes some historical data. This difference is only significant to humans because we have quite poor memories.

An AI has perfect recall, and the fact that a variable is temporal really doesn't make a difference to the theory.

(Coincidental aside, this is an issue in Go as well due to the ko rule; but it isn't a major part of the game such as in poker.)

jononor · on Dec 23, 2017

Which AI architecture has perfect recall, in a way that actually makes use of this information? Pretty sure AlphaGo etc does not make use of memory at all.

antognini · on Dec 21, 2017

It's true that you can't model the entire game perfectly for any interesting game. But to play a game like Go you only need to know the current state in order to play optimally. If you play poker in a way that only accounts for your current state (i.e., you just take an expected value given the cards on the table, the cards in your hand, and the size of the pot) it won't take long for a competent player to learn how to consistently beat you. (Such a player would get tricked by bluffing, for instance.) In order to play correctly you need to model your opponent's strategy by understanding what their past moves were so that you can guess their probability of bluffing.

tlholaday · on Dec 21, 2017

I wonder about an AI being as reluctant to go bankrupt as is a human. Maybe that reluctance is abstracted away by the betting evaluation.

xapata · on Dec 21, 2017

Yeah, that's a major flaw in their research design. In their 120k-hand competition they reset the players' bankroll after each hand in order to have the same number of hands for each trial. This dramatically changes the payoff for aggressive play. The humans were too cautious against the bot, as their experience was in an environment when going bust could end your session.

In real play, going bust* doesn't just lose your current stack, but also all future possible winnings. In their experiment, going bust only loses that stack. Kind of like being a soldier rather than a general, or founder rather than a VC.

* Yes, you can buy in again, but eventually you've exhausted your bankroll.

splonk · on Dec 21, 2017

I don't believe this meaningfully affects a top HUNL cash player, who tend to have auto-rebuy set anyways and at least think that they're bankrolled for most games they play. If anything, it might help them a bit by letting them continuously play at the stack size they have the most experience with.

xapata · on Dec 22, 2017

Bankrolled to what extent? The analysis after the experiment showed that the bot pushed small edges far harder than the humans did. Little else differed. That's consistent with my theory.

splonk · on Dec 21, 2017

There's some previous discussion of the "beats top professionals" part of this here: https://news.ycombinator.com/item?id=13534778 https://news.ycombinator.com/item?id=13358138

Note that this shouldn't really be a surprise - it's very likely that the best HUNL AIs were as good or better than the best humans maybe 5 years ago. The recent developments over the last couple years are more about widening the gap and increasing the confidence in the results. I haven't fully read this paper yet but my impression is that it's mostly about architectural and abstraction improvements, and possibly just chucking more hardware at it.

StanislavPetrov · on Dec 22, 2017

Interesting research, although people familiar with poker would not consider Jason Les, Dong Kim, Daniel McCauley, and Jimmy Chou "top professionals". They may be very good at HUNL but they are definitely not the "top".

Many people will look at this and say, "computers have solved poker" without understanding that this is applicable to heads up poker only. Full ring game poker, where there are an uncertain number of non-GTO players still to act behind, is an entirely different beast.

I'd also be interested in knowing how deep the stacks were in the human vs computer matches. The complexity of the game is directly related to how deep stacks are. A 10 big blind hu match, a 100 big blind hu match, and a 1,000 big blind hu match are all three entirely different games with entirely different decision trees.

DominikPeters · on Dec 21, 2017

Video of an interesting panel from AAAI-17 with the heads of the two teams who built human-level heads-up no-limit programs.

http://videolectures.net/aaai2017_bowling_sandholm_poker/

They use very different approaches (deep learning versus clever game-tree pruning). It's not clear how to tell which of them is the stronger program. Both teams seem to think that theirs is stronger, but I think only Libratus (the pruning-based approach) has beaten top poker pros in a tournament.

xapata · on Dec 21, 2017

> beaten ... in a tournament

They didn't play a real tournament with cash at stake.

yazr · on Dec 21, 2017

Recent reddit AMA on this

https://www.reddit.com/r/MachineLearning/comments/7jn12v/ama...

korbonits · on Dec 21, 2017

The demo of this at NIPS was awesome!

nathanaldensr · on Dec 22, 2017

One interesting error in their paper: 27o (two-seven offsuit) is actually the worst hand in a 10 player game. 23o (two-three offsuit) is the worst hand heads-up.

noambrown · on Dec 27, 2017

Author of the paper here. This actually depends on the stack sizes. At 200BB deep, 27o is the worst hand, though 23o is not far behind.