More

noambrown · on Jan 19, 2022

That's not true in practice for poker. Pluribus showed that if you run CFR in multiplayer poker you get a solution that works great in practice. Multiple equilibria are certainly a theoretical issue for many games, but poker conveniently isn't one of them.

bluecalm · on Jan 19, 2022

It's not about multiple equilibria but about (often unintended) collusion. Examples of that affecting poker games are very well known. One frequently occurring example was discussed in online community 15-20 years ago (BTN raises in a limit Holdem game, SB calls too much which hurts both the SB and the button giving equity to BB).

I don't think you're correct saying it doesn't affect poker as people were able to notice and analyze this before solvers. It's true though that no-limit Holdem as played today (two blinds,no ante,deep stacks) is likely not strongly affected by the phenomena. I don't agree Pluribus experiment shows much when it comes up practical play. Not enough variety of skill levels, not enough hands and not enough time for metagame (people adjusting to how others play) to develop. I do agree pure equilibrium play is most likely not terrible in cash game nlhe but definitely not in poker in general.

noambrown · on Jan 19, 2022

Bots are superhuman in self-play Hanabi: https://ai.facebook.com/blog/building-ai-that-can-master-com...

The remaining challenge is getting it to play well with human partners. Doing that requires modeling human conventions rather than learning weird bot conventions. That's hard because while you can collect essentially unlimited data through self play, it's hard to collect a lot of data playing with humans using reinforcement learning. AI algorithms are really bad at sample efficiency.

noambrown · on Jan 18, 2022

Bots are superhuman in no-limit Texas hold'em. Libratus beat top humans in two-player in 2017 and Pluribus beat top humans in six-player in 2019: https://www.science.org/doi/abs/10.1126/science.aao1733 https://www.science.org/doi/abs/10.1126/science.aay2400

It's shocking that the reporter didn't mention these results or anything else more recent than 2015.

hooande · on Jan 18, 2022

Have to say, Pluribus beating top humans for 10,000 hands isn't the same thing as being super human. It's just too small of a sample to make that claim.

Further, thousands of the hands that Pluribus played against the human pros are available online in an easy to parse format [0]. I've analyzed them. Pluribus has multiple obvious deficiencies in its play that I can describe in detail.

It seems like it's very difficult to set up any kind of proper repeatable and controlled experiment involving something as random as poker. Personally, I would be much more convinced if Pluribus played against real humans online and was highly profitable over a period of several months. This violates the terms of service / rules of many online poker sites, but it seems like the most definitive way to claim terms like "solved" or "superhuman"

[0] http://kevinwang.us/lets-analyze-pluribuss-hands/

noambrown · on Jan 18, 2022

Normally 10,000 hands would be too small a sample size but we used variance-reduction techniques to reduce the luck factor. Think things like all-in EV but much more powerful. It's described in the paper.

jessriedel · on Jan 19, 2022

Where is the variance-reduction technique discussed? I looked at this paper

https://www.science.org/doi/abs/10.1126/science.aay2400

https://par.nsf.gov/servlets/purl/10077416

and it just says

> Finally, we tested Libratus against top humans. In January 2017, Libratus played against a team of four top HUNL specialist professionals in a 120,000 hand Brains vs. AI challenge match over 20 days. The participants were Jason Les, Dong Kim, Daniel McCauley, and Jimmy Chou. A prize pool of $200,000 was allocated to the four humans in aggregate. Each human was guaranteed $20,000 of that pool. The remaining $120,000 was divided among them based on how much better the human did against Libratus than the worstperforming of the four humans. Libratus decisively defeated the humans by a margin of 147 mbb/hand, with 99.98% statistical significance and a p-value of 0.0002 (if the hands are treated as independent and identically distributed), see Fig. 3 (57). It also beat each of the humans individually.

alasdair_ · on Jan 19, 2022

>The remaining $120,000 was divided among them based on how much better the human did against Libratus than the worstperforming of the four humans.

Surely the correct strategy here is for the human players to collude to give as much money as possible to a single player and then split the money afterwords, no?

Also, the fact that they players can only gain money without losing anything likely changes their play somewhat. By default I'd assume (and have generally observed) that most players on a freeroll (or better than a freeroll really) tend to undervalue their position and gamble more than is usually wise.

I'd definitely be interested in seeing a "real" game where the humans are betting their own money.

noambrown · on Jan 19, 2022

The four humans were getting $120,000 between them. Their share of that was dependent on how much better they did than the other humans. That means there was no incentive to collude.

Top pro poker players understand the value of money. They weren't treating it as a freeroll and anyone that has seen the hand histories can confirm that.

noambrown · on Jan 19, 2022

It's in the supplementary material of the 2019 paper: http://www.cs.cmu.edu/~noamb/papers/19-Science-Superhuman_Su... . Look at the "Variance reduction via AIVAT" section.

bostonsre · on Jan 19, 2022

Do you think human players could use the results of this paper to learn how to be better poker players? I'm wondering if it could be an alpha go type situation where players learned different strategies.

lawrenceyan · on Jan 18, 2022

Noam is being too humble, but he’s one of the primary creators of Libratus/Pluribus as an fyi.

bko · on Jan 18, 2022

The article was fascinating but honestly read like an ad for PioSolver

bluecalm · on Jan 19, 2022

The journalist who contacted me told me he did so because the software keeps coming up when he talked to pro players. While it's certainly not one to advance the science the most (talk to Noam Brown if you want that), not the fastest (talk to Oskari Tammelin about that), it's still very popular and the first to get big following. It changed the game and got into the online poker culture. I am quite proud of that and I think it's deserved to be mentioned a lot in an article about how computers changed poker.

howdydoo · on Jan 19, 2022

I know that PioSolver is not a "poker AI" per se, but the article seems to say it can tell you what to do based on the table situation. Has anyone tried pitting pro players against PioSolver?

jeffreyrogers · on Jan 19, 2022

PioSolver requires putting in the hand range of the opponent, so the quality of PioSolver's solution is largely down to how accurate the guess at that hand range is. But if a pro knows he is playing against PioSolver configured with a certain hand range he can just change his strategy to adapt. In theory though if PioSolver knows the correct hand range then it shouldn't be possible to any better than tie given enough hands.

noambrown · on Dec 6, 2019

The search algorithm shares a lot in common with our Pluribus poker AI (https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-...), but we added "retrospective belief updates" which makes it way more scalable. We also didn't use counterfactual regret minimization (CFR) because in cooperative games you want to be as predictable as possible, whereas CFR helps make you unpredictable in a balanced way (useful in poker).

The most surprising takeaway is just how effective search was. People were viewing Hanabi as a reinforcement learning challenge, but we showed that adding even a simple search algorithm can lead to larger gains than any existing deep RL algorithm could achieve. Of course, search and RL are completely compatible, so you can combine them to get the best of both worlds, but I think a lot of researchers underestimated the value of search.

hooande · on Dec 7, 2019

I just spent three weeks going through your research. Thank you for that work, especially the supplementary materials.I wish I'd known how much the ideas in the pluribus paper depended on reading the libratus paper.

I see what you're saying about the real time search (which took me quite some time to understand). I came up with a way to do that from disk due to memory limitations. It limits the number of search iterations but doesn't seem to have a huge negative impact on quality so far.

Anyway, thanks again!

noambrown · on Dec 6, 2019

Hi! I'm one of the authors on the paper. We'd be happy to answer any questions. Ask us anything!

gjstein · on Dec 6, 2019

Hey Noam, this is some great work; I'll need to sit down and give the paper a deeper read. Also, the visualizations on this blog post are incredible.

I saw a talk on the Libratus agent a while back, and one of the most interesting takeaways was that the behavior of the bot had already started to impact the professional players, who now spontaneously bet large amounts to force other players out of a hand. Were there any behaviors your agent demonstrated that surprised you in the same way? What insights might we draw from this cooperative AI system that may have more general applicability to other planning domains?

noambrown · on Dec 6, 2019

In terms of Hanabi, this bot arrived at conventions that are pretty different from how humans play the game. We invited an advanced Hanabi player to play with the bot and he pointed out a few things in particular that he'd like to start using. For example, humans usually have a rule that if your teammate hints multiple cards of the same color/number, you should play the newest one. The bot uses a more complicated rule: if the card you just picked up was hinted then play that card, otherwise play the oldest hinted card. That gives you way more flexibility to hint playable cards that would otherwise be tough to get played.

I think one important general lesson is that search is really, really important. Deep RL algorithms are making huge advancements, but Deep RL alone can't reach superhuman performance in Go or poker with search. Here, too, we see that search was the key to conquering this game, and I think that will hold true in more complex real-world settings as well. Figuring out how to extend search to more complex real-world settings will be a challenge, but it's one worth pursuing.

JoshTriplett · on Dec 6, 2019

> For example, humans usually have a rule that if your teammate hints multiple cards of the same color/number, you should play the newest one. The bot uses a more complicated rule: if the card you just picked up was hinted then play that card, otherwise play the oldest hinted card. That gives you way more flexibility to hint playable cards that would otherwise be tough to get played.

I've definitely seen advanced Hanabi players use a more subtle version of that rule: "If your hint looks like it's telling me to play my leftmost hinted card, how long has that card been playable? If it could have been hinted for play a long time ago, and it's just being hinted now, it must not be playable. So what else must you mean...?"

That version of the rule allows for more subtle cases. Suppose you hint that a player's second-from-the-left and fourth-from-the-left cards are both red. If there hasn't been an opportunity to hint the second-from-the-left since it became playable, go ahead and play the second-from-the-left. If there have been opportunities to hint second-from-the-left, play fourth-from-the-left.

That rule requires human players to model whether the other players' actions in the interim have been "urgent" things that needed taking care of before hinting them, or whether those other players would have hinted them sooner if their card was playable.

Madmallard · on Dec 6, 2019

Isn't this not the same thing as AI that can beat humans at things like Bridge where the bidding game matters quite a lot? IIRC in Hanabi the fact that there is imperfect information does not really matter that much for strategy, where as in things like League of Legends or Bridge or many of those types of games it really does matter quite a lot.

noambrown · on Dec 7, 2019

Bridge has a similar challenge, though from what I understand Bridge AIs are not superhuman yet. I suspect our techniques could be applied to Bridge, though they may need to be adapted a bit.

The imperfect information in Hanabi absolutely matters a ton. It's not an interesting game without it.

AndrewKemendo · on Dec 6, 2019

Can you come back in a day or so and answer some questions? I, like others, need some time to digest it.

noambrown · on Dec 6, 2019

Definitely!

kevinwang · on Dec 7, 2019

Damn, I haven’t gotten around to fully reading Pluribus yet, and now there’s more? Congrats on the results! What’s next?

noambrown · on Dec 7, 2019

Thanks! We're looking in a few different directions, but one thing I'm excited about is mixed cooperative/competitive settings. In poker, there is no room for cooperation. In Hanabi, you are 100% cooperating with your teammates. But most real-world situations, like a negotiation, are somewhere in between. The AI techniques for these settings are not too strong yet.

carapace · on Dec 6, 2019

What's FB going to do with this?

noambrown · on Dec 6, 2019

Open source it, learn from it, and build upon it to continue to push forward the frontier of AI.

nojvek · on Dec 7, 2019

That would be nice. What is Facebook AIs take on ethical use of its research?

Facebook probably pays through the nose for AI research and probably wants a ROI. Facebook makes money by building better user models and spamming targetted ads. Some of them are getting scarily good.

noambrown · on July 12, 2019

From an AI and game theory standpoint, there isn't much difference between two-team zero-sum and two-player zero-sum if the teammates are trained together. That said, the Dota 2 work is extremely impressive for a variety of other reasons.

solidasparagus · on July 12, 2019

There is far more ambiguity when you are competing against five mostly-aligned strategies vs a single shared strategy.

noambrown · on July 11, 2019

We considered both options but decided to go with 100BB because that is the standard in the poker world. It doesn't make a big difference for these techniques though.

srkigo · on July 22, 2019

Could you try to run a training with ante included in the pot? I wonder if open-limping would be a viable strategy with some hands. No one knows that and it would be really interesting to find out. Ante should be equal to BB, like it was in WSOP Main Event.

ayemeng · on July 11, 2019

Does Pluribus care about opponent's stack depth? From the examples, it appears stacks were reset after each hand.

noambrown · on July 11, 2019

Training was super cheap. It would cost under $150 on cloud computing services.

The training aspect has some improvements but is at its core similar to Libratus. The search algorithm is the biggest difference.

There aren't that many great resources out there for helping new people get caught up to speed on this area. That's something we hope to fix in the future. Maybe this would be a good place to start? http://modelai.gettysburg.edu/2013/cfr/cfr.pdf

dharma1 · on July 11, 2019

Is Oskari Tammelin still working on this stuff? I remember he wrote some very fast CFR optimisations a few years ago

noambrown · on July 11, 2019

No, it wouldn't be hard. We chose six players because it's the most common/popular form of poker.

Also, as you add more players it becomes harder and harder to evaluate because the bot's involved in fewer hands, we need to have more pros at the table, and we need to coordinate more schedules. Six was logistically pretty tough already.

noambrown · on July 11, 2019

It was online. The players were playing from home on their own schedules. The bot did not look at any tells (timing tells or otherwise). The players knew they were playing a bot and knew which player the bot was.