DeepStack: Expert-Level Artiﬁcial Intelligence In Heads-Up No-Limit Poker

LeanderK · on March 16, 2017

How can one create visualisations like Figure 3 (assuming it's latex)[0]? I have to write my Bachelor thesis soon and need to improve my latex-skills. I only know to to use the math-mode and write text.

[0] the paper: https://arxiv.org/pdf/1701.01724.pdf

mikebjohanson · on March 16, 2017

Hi - I'm one of the authors on this paper. We had professional designers do the figures for this paper; Fig3 in particular was far better than what we could do ourselves.

andr3w321 · on March 16, 2017

Just working my way through the pseudocode to try and gain a little insight. I think there may be a small typo. Line 29 VALUE should be VALUES?

fhackenberger · on March 16, 2017

Probably TikZ. There are several frontend UIs available for accelerating the process of creating diagrams. One is ktikz, which I initiated a while ago.

freyr · on March 16, 2017

You can create images using your favorite image creation software (e.g. Adobe Illustrator, Inkscape, MS Paint, D3, anything) and embed them in a LaTeX document.

nerdponx · on March 16, 2017

Don't forget GraphViz. Although it's probably underpowered for that figure in particular.

mastazi · on March 16, 2017

non-pdf link: https://arxiv.org/abs/1701.01724

mastazi · on March 16, 2017

With regards to Texas Hold'em, Conterfactual Regret Optimisation [1][2] was already able to beat humans in limit games, while it seems that this new approach is able to do the same in no-limit games. My naive understanding is that no-limit games are harder to crack because there are more possible outcomes, thus the available information (poker is an "imperfect information" game) becomes "even more imperfect".

[1] https://arxiv.org/abs/1407.5042 (yes, the author is the same Oskari Tammelin who created Jeskola Buzz, a popular piece of music software).

[2] http://www.nature.com/news/game-theorists-crack-poker-1.1668...

PaulHoule · on March 16, 2017

One thing I haven't seem come up in discussions or media coverage about this is that the heads up game is entirely a different thing than playing with 5 other players.

Tenoke · on March 16, 2017

For what is worth, heads-up is a very popular game, and from reading the paper, the approach should more or less be applicable to 6max or full ring, except it will be way more computationally expensive (but not unfeasibly so like before).

jumpCastle · on March 16, 2017

In my understanding the approach is based on game theoretic principles that do not extend naturally to more than two players. If that's correct then it's not only a computational issue. See http://www.nature.com/news/how-rival-bots-battled-their-way-...

peternilson · on March 16, 2017

Yip, exactly this. The CFR algorithm they speak of is based on finding a nash equilibrium for each strategy-pair in the game tree. Something that I think can only be computed for 2 players, and no more.

mikebjohanson · on March 16, 2017

There has been quite a bit of work (and several papers) on using CFR for games with >2 players. It is not theoretically guaranteed to converge to a Nash, and usually doesn't in practice either (there's one case in a toy game, in 3p Kuhn poker, where it does converge). However, even though it doesn't converge to Nash, the algorithm is still well defined, and the resulting strategies appear strong in competitions against other computer adversaries, as demonstrated in the Annual Computer Poker Competition.

Tenoke · on March 16, 2017

Surely you can adapt it to compute an approximation of a mixed Nash equilibria. My game theory isnt at a high level, but there are similar computations you can perform, and when I used to play poker, you'd have common calculations based on it for more than 2 players when you are playing with a limited stack. I dont see how that wouldnt be adaptable (abeit more computationally expensive) for more players, but I might be missing something.

jumpCastle · on March 16, 2017

I'm not sure Nash equilibrium exists for more than two players in this game.

mikebjohanson · on March 16, 2017

Nash equilibria are still guaranteed to exist. But it's only the 2p zero-sum perfect recall case where an equilibrium has useful properties, like being robust against any opponent strategy, including a worst-case opponent who knows your strategy. In a multiplayer (> 2 players) game, opponents can collude against you and playing your part of a Nash gives no useful guarantees on performance. Even if they aren't colluding, in poker games, the presence of a bad player just before you in turn order can hurt your EV worse than their own. And even if all players independently compute their own Nash equilibria (there can be many) and use the piece for their position, then that combination of strategies may not itself be a Nash equilibria.

xapata · on March 17, 2017

According to Wikipedia, and my memory from school, Nash equilibrium doesn't exist unless all players are aware of the optimal strategy and are seeking it.

jumpCastle · on March 16, 2017

Thanks for the explanation. I will also appreciate a reference to a good survey paper if you are in the mood :)

mikebjohanson · on March 16, 2017

I don't know of a survey paper on CFR for multiplayer, but it's been showing up in conference papers and theses.

Here's a link to a shorter conference paper where CFR does converge to a Nash, in 3p Kuhn poker. It describes a family of equilibria, where one player (the second to act, IIRC) has a parameter that can't affect their own EV (...or else it wouldn't be a Nash), but does determine how much the other two players win/lose from each other. This illustrates the problem in equilibria for multiplayer games: if you're players 1 or 3, then even if you are playing a Nash, and everyone else does too (albeit different equilibria), then you can still lose. https://webdocs.cs.ualberta.ca/~games/poker/publications/AAM...

For a longer read, the best I know of is probably Rich Gibson's PhD thesis. He focussed on CFR for multiplayer games.

https://webdocs.cs.ualberta.ca/~games/poker/publications/gib...

jumpCastle · on March 16, 2017

Thanks again. So I guess for multiplayer games a better approach would be reinforcement learning with no game theoretic heuristics?

xapata · on March 17, 2017

Game theory is always applicable to games :-)

Nash equilibrium is just one aspect of some games.

jumpCastle · on March 17, 2017

Game theory is domain specific. Generic methods in AI tend to dominate domain knowledge over time. Although I agree that other game-theoretic techniques might help here.

xapata · on March 17, 2017

Game theory is specific to the domain of agents optimizing outcome in adversarial, cooperative, or (rarely) solitary systems. That's a pretty big domain.

andr3w321 · on March 16, 2017

Anyone know where to find the pseudo code? It references item 10 in the References and Notes which states See Supplementary Materials.

foob · on March 16, 2017

They're included in the arXiv version of the paper: https://arxiv.org/abs/1701.01724

matsemann · on March 16, 2017

Pretty helpful with the annotation along the way, explaining mbb/g and what it means, for instance.

grumpopotamus · on March 16, 2017

Any hints on how to implement CFR-D on the GPU?

IMTDb · on March 16, 2017

This seems to be a major threat to online casinos. Is there any study on how they plan on fighting the AI's that will likely be flooding the market soon ?

andr3w321 · on March 16, 2017

The market has been flooded with them for years. They have various mechanisms for fighting them like mouse detection and scanning your computer for various programs and confiscating your funds if caught. In the end, people will always find a way around the detection methods though.

SamBam · on March 16, 2017

Like using a chess computer, it seems that there would be no way to protect against someone who relays the cards to a separate machine, and then enters the bids by hand on their own computer.

throw_away_777 · on March 16, 2017

There are ways to detect cheating on chess websites, though these methods are not perfect. The two main signatures are:

1) Playing moves that the engine recommends consistently. This looks for people who are inhumanly good, even the top chess players in the world regularly deviate from engine suggestions.

2) Time between moves always takes a minimum threshold, the time needed to relay the position to the engine. Humans vary wildly in the time thought based on the position, in a way that cheaters do not.

Neither of these detection methods are perfect, especially for small numbers of games. I don't think we will see large prizes for on-line chess tournaments because of possible cheating.

matsemann · on March 16, 2017

Somewhat easy to detect a human using a computer, by comparing to what the computer would do, the rating of the player and the usual play of the player. However, a chess professional would be able to use a chess computer almost undetected, as they know what would look like "computer moves", but could use it to just choose the best of their own suggested moves.

Tenoke · on March 16, 2017

Actually, there are - most(all?) of the popular algorithms play in somewhat specific ways that are somewhat detectable (or at least flaggable).

gdulli · on March 16, 2017

Heads-up is a contrivedly simple scenario compared to solving poker in general. There have been bots for a long time that have co-existed with real players, and we're likely not close to a scenario where bots can reliably beat real players in general for real world scenarios.

However, in theory even a break-even bot could be a gold mine because it can earn loyalty points. That's how it was the last time I was playing online, I don't know if loyalty programs still work the way they used to. Of those programs would just go away if abused enough.

Tenoke · on March 16, 2017

Loyalty points are paid out of (and lower than) the rake, so the bot still needs to be better than break even.

alasdair_ · on March 16, 2017

Online poker has the concept of "bonus whoring" where one takes advantage of various loss-leader promotions offered by different sites.

A typical promotion would be offering a 20% bonus on a $1000 deposit, with the caveat that you must play in, say, 1000 raked games before cashing out. It's possible to play in games where the total expected rake if the user breaks even is, say, $100, leaving $100 profit.

sherjilozair · on March 16, 2017

Why do you say we're not close to solving Poker in real-world scenarios? The multiplayer version is likely to be harder, but is it so hard, that it'll take decades more?

xapata · on March 17, 2017

Depends what you're trying to achieve. If it's a bot that beats most humans, that probably already can be done. If it's a bot that earns more from a cash ring game than the world's best would against all levels of players...

It took many years for a computer to bridge the gap between better at chess than almost all humans and better than the best human.

Tenoke · on March 16, 2017

When they released the initial paper a few months ago I looked into it and it sounds like this wont be profitable anywhere except the highest levels (although obviously the highest costs are in the training phase).

This in combination with casinos already doing a lot for bot-detection makes me think that we are at least a few years away until this is an issue.