Hacker News new | past | comments | ask | show | jobs | submit login
How A.I. Conquered Poker (nytimes.com)
248 points by jdkee on Jan 18, 2022 | hide | past | favorite | 181 comments




(Former pro and high stakes player, occasional solver developer)

This is actually one of the best poker articles I've ever seen in generalist media. Not too clickbaity, reasonable high level overview of game theory, a (very accurate IMO) quote from old pro Erik Seidel about the state of the game just 15 years ago, a discussion on variance vs. EV and results, and most importantly, an emphasis on math, randomization techniques and emotional control rather than the TV image of staring in someone's eyes and reading their soul. Probably the one biggest misconception people have is that pros have sick reading abilities since TV likes to emphasize staredowns, when the actual single biggest skill long term pros have is the ability to lose hand after hand for hours and still play their best game.

Incidentally, the anecdote at the top of the article is pretty intuitive game theoretically. Basically on the river you need to bluff with some portion of your hands, or else nobody will ever call when you have a good hand. The natural portion of your hands to bluff with is the absolute worst ones - you don't want to bluff with your middling hands because you have some small chance of just winning a showdown when it checks around. On a board of Kc4c5c2d2c, 7d6d is quite likely the absolute worst hand you can hold given the action that's taken place, therefore it's the one you bluff with. (For some pot/stack sizes it's possible that you bluff so rarely that you have to choose between 7d6d/7h6h/7s6s, but that's getting into details.)


Similar story here. I’ve always played, mostly online, before Black Friday (pretty sure I was a minor when it happened) but played all the offshore sites through college. Professional software career took off and bounced out, finally ~3y ago I decided to get back into it and I would only play challenging games (only thing challenging about a 1/3 or tourneys is the grind). So, started with 2/5, within a week was playing the 10/25, 25/50.

1. Emotional control and pushing through the variance are key, I would play all nighters and it was mostly a subtle rise up vs big pots (except the one that did me in)

2. You absolutely do read players, I didn’t need to be a pro to learn how to observe the degenerates. Even decent players signaled things here or there (obv not how they eat an oreo , but I had a knack for knowing when someone was off their style). And it’s not even close wit a normal group of friends.

3. the sharks and their meta game… I had run up to about 50K within 3 weeks from my starting 2-3K. Had the pipe dream. I thought I had a growing bond with the pros. When I say pros I’m not talking grinders, I’m talking consistent big winners. About a handful of them. None of them tv personalities. - There were few times we’d run into each other, but I picked off a few donk bets here or there. However their persistence is what did me in. Went the entire month observing their advanced play style, which the meta was harsh, there were few rivers that mattered more than pre flop and continuation/donk betting. This style of play was like nothing I was seeing on tv. But I was winning with them. The last time I played, about a month into my run, I was on a 12h 25/50 game winding down, turned 30k into 70k (had 20k sitting back) pulled QQ and went up against a pre-flop donker. I face a 3 bet and get that gut feeling but put on lower pair or AK. Flop rainbows, large betting action. Turn, junk, I face an insane Re raise after testing a raise of my own, I know it’s off but it’s either AA/KK or AK at this point. My guts off but I’m too far in to quit. On the river I’m defeated, check, face bet, ultimately end up throwing 50K into the pot, lose to aces.

I never had a gut feeling so intense though. Like I was screaming at myself to get out. But I was caught.

Such a draining hand I never went back. I don’t have the discipline to grind games, and the tech industry pays more.


When you say tech industry pays more, I'm assuming the effective long-run hourly earnings of poker players is way less glamorous than it looks from an outsider seeing highlights? Doing a basic google search led me to https://pokerdb.thehendonmob.com/ranking/6737/, which seems to suggest that 81 players earned over 1mm USD in 2021. But presumably those players don't earn that YoY, and even if they did, plenty more software engineers earn > 1mm USD per annum, so this seems to support your conclusion. OTOH, I have no insight to the home games or whatever sources are not included in these rankings, so I'd love to hear a little more.


FanaHOVA mentioned the main thing you missed. One other thing I'd mention is that a lot of pros have staking arrangements (the article briefly mentions some versions of them). For example, I know one pro (who's won one of the more prestigious WSOP bracelets) who gets 30% of his tournament winnings in exchange for having all his entry fees paid by his backer. I suspect he's traded off more upside in favor of stability than most, but there's a lot of deals out there where the players aren't taking home anywhere close to 100% of their public winnings.


Those aren’t net of buy ins. Negreanu for example won $3.1M, but spent $2.56M in buy ins, so net profit was around $600k (doesn’t account for all his content ofc, but that’s not every player)


The other commenters said it for tourneys. I can’t stand tournaments.

For cash games, I was only playing on the casino. For the 1/3 (1$/3$ blinds) the earnings rate seems absolutely abysmal and you have to account for rake. There are grinders who play these games, along with 2/5. My assumption is any grinder playing here is down under or not making enough to compete with a professional career in the tech industry.

Next is 2/5. Here I could’ve sustained a decent salary. A pro at these stakes showed me his past year earnings which was 180k.

It’s hard to tell with the 10/25, 25/50 pros. There were definitely dudes sitting on 500k+, maybe way more, chips that they kept in the casino and I assume similar amounts of cash at home. There were guys leaving with big nights but still losing 90k in a night. I’d hear about some being down under 200K at any point. There were probably some money launders. There was the occasional private investment fund guy that came in, was short with slick back hair, very loud mouth, would brag and show off one of his 17M personal banking account. Then there were the very quiet, calculated guys. It seemed that they were operating on some formula I never cared to figure out. They’d avoid big hands for the most part. Some form of grind I didn’t have the discipline for. I also didn’t have the discipline to handle large swings at those stakes, nor the bankroll, or desire to play lower stakes to build a bankroll.


You lost $50k in a 25/50 online game? Was the pot 2000 BBs??? I've never even heard of anything close to it.


Casino, there was no max. A lot of players sitting on 100k+. There were far bigger hands than this one.


> Probably the one biggest misconception people have is that pros have sick reading abilities since TV likes to emphasize staredowns, when the actual single biggest skill long term pros have is the ability to lose hand after hand for hours and still play their best game.

This is absolutely true, but let me add to this: live pros like Phil Ivey, who have played the game in person virtually their entire lives, do have sick reading ability. However, it mostly works against amateurs. Actual pros know how to hide their tells better. If you take an online pro who has been clicking at the computer all day on multiple tables, that pro will not have anywhere close to the reading ability of Phil Ivey.


I think old school pros like Ivey and Dwan are very good at reading the entire table. Dwan emphatically and correctly states who had the best hand against his bluff [1]. Ivey emphatically and correctly basically calls a guy a liar at the end of a hand over what the guy said he folded [2]. They have amazing "situational awareness".

[1] https://youtu.be/dS_uv88YuPs?t=179 [2] https://youtu.be/IN1bW4jo4i4?t=72


Indeed, but it should be made clear that these kinds of reads aren't necessarily (or likely) coming from physical appearances alone - there's all sorts of metadata to consider. For example, in the Dwan hand, the timing of Eastgate's call on the flop tipped his relative hand strength, showing that he had a very strong hand but also capping his range. A hand weaker than trips would have deliberated between folding and calling, and a hand stronger than trips with a weak kicker would have likely paused to consider raising. Of course, savvy players can fake these timing tells, but Dwan also knew that Eastgate was out of his depth (a tournament player making a charity appearance on a televised cash game because he'd just won the WSOPME and had yolo money), so he was unlikely to be doing this - more situational metadata.


So how much have you won and lost in total? I don’t have any knowledge of poker, so I don’t know how to ballpark this. And the impersonal questions are boring after seven lifetimes, so I thought I’d ask.

It would be interesting to see a GitHub style graph of poker activity, with won and lost corresponding to added lines and removed lines.


6 figures. Turns out there's a lot more (and less risky) money in software development than there is in poker. I gave up the professional poker life pretty early and actually made quite a lot more as a talented "amateur" with a Silicon Valley software dev salary that took away the risk of ruin constraints I had a pro. Plus I didn't really enjoy playing professionally anyways - it's pretty mindnumbingly boring to play the number of hands you need to in order to get the EV/variance tradeoff to some acceptable point, at least for me.

My highest and lowest sessions were in the 5 figure range (cash game player, not tournament), so I was never up in the really nosebleed stakes, but I was high enough up that a lot of the nosebleed players would play in my games. Some of them have an impressive ability to play their A-game even when switching between games an order of magnitude in stakes apart. Some of them, uh, don't. People you've seen on TV are overwhelmingly more likely to be in the second category and were generally welcome in my games, at least for monetary reasons. (There was an old screenshot of Phil Hellmuth sitting at a full 9- or 10-handed 300/600 limit hold'em table online. He's out of chips and every other player is sitting out, waiting for him to reload. I have a pretty similar live story about him.)


I imagine that for pro players to play at their best can be taxing and wouldn't be surprised if some just wanted to play less costly games for more what they could deem to be more fun.


Are you Paul P?


Nah, he's def Tony G


Guessing you're talking about extempore on HN, but no.


> The natural portion of your hands to bluff with is the absolute worst ones - you don't want to bluff with your middling hands because you have some small chance of just winning a showdown when it checks around.

I don't understand this logic, can you elaborate a bit more? What do you suggest to do with middling hands then?


One very simplified model is that on the river, all your hands fall into certain buckets, from strongest to weakest. (I'm ignoring a ton of nuance here and there are actually many more buckets in a real game)

- Worth betting, because you have the best hand

- Worth calling, because your hand is good enough to beat a bluff (and maybe some value bets as well).

- Intending to fold, because your hand is bad, but maybe it's good enough to win if the opponent's hand is worse and they don't bluff.

- Worth bluffing, because your hand is so bad it can't win otherwise.

The middling hands would literally be the middle two buckets in this example. Call with the better ones, fold with the worse ones. (To complicate this more, in a real world situation the worst part of the "intending to fold" bucket might become a "planning to bluff-raise" bucket, due to similar logic as to why you bluff with your worst hands.)


Yeah but that only works in the very simplified model. In the real game you pretty much always choose bluffing hands based on card removal (the ones that makes your opponent holding a calling hand less likely).


I think he’s saying you don’t want to “waste” a bluff on a hand you might end up winning anyway just by checking it down.


> What do you suggest to do with middling hands then?

You try to get to a showdown. With middling hands there's a chance the opponent has a worse hand. It's when your hand is so bad your opponent has you almost certainly beat that you get to the bluffing territory, as that's the only way for you to win.


Interesting, I read some Sklansky books about 15 years ago and he talked about semibluffs, where you're partly bluffing but you've also got some chance to win. Sounds like strategy has evolved into doing the exact opposite.


The distinction is that semi-bluffs are for when there are still cards to be dealt.


Would we call that thin value betting on the river?


I don't really think of thin value betting and semi-bluffing as the same concept on different betting rounds. Thin value betting gets its positive expectation (versus checking) entirely from getting called with a worse hand. It doesn't really have positive expectation from inducing a fold the way that semi-bluffing does.

At least, that holds if you assume the opponent will always call with hands above a threshold and fold with hands below it. Which is correct play (though kindly ignore raises, for simplicity - but they don't blow up the argument). Only if they behave very strangely, e.g. folding with the very best hands and worst hands but calling with the decent ones, can I see a better connection with semi-bluffing on early streets.


I should also say my explanation assumes we're in last betting position here. If we're first to act, the logic is somewhat more complicated than I can successfully conjure up here without going back into the books - it's been many years since I played seriously.


Ahh, ok thanks.


Good to see poker theory approaching the way I naturally thought when I played on PokerStars (before Chuck Schumer and co shut it all down for us).

People used to make fun of me for playing limits and sucking out on people. And no limit I would have some crazy strategies to fake out a table and make them anger-bet into my connected straight flush draws, when I was “representing” a two pair etc.

On large tables, often you want many people to stay in, until the river, because then the pot is big enough that multiple people will go all-in. You just have to be very sure you probably have the best hand (such as a flush that you were drawing for). I woul actually “goad” people with raising small amounts, whenever I had suited connectors for example, and if I hit a draw or the nuts on the flop, I would act as if I am “protecting” a top pair or so. On the other hand, when you are on a draw and if you want to see the next hand on the turn, you have to bet the flop much harder, because then it signals to most “basic” people that you’re “protecting” a hand — therefore gaining you an informational advantage when you do hit — as well as the “intimidation” factor that lets you “check” the turn to see the river — because they often think you’re checkraising them and also check, giving you another free card. You pay more on the flop but have two free cards after. Also If you don’t hit your draw, sometimes they didn’t hit either, and you can take it down with an aggressive bet since you were “representing” that you flopped a strong hand. To summarize:

  1) play large tables
  2) play suited connectors, or suited Ace-something
  3) limp in or if you are late-position, double the bet to make people call and grow he pot and be more excited to bet later, you also represent a reasonably high pair
  4) after a flop - if you are one card away from hitting a very strong draw (eg ace high flush) then bet hard on the flop — late position is always best for this… otherwise you have to bet less to keep people in
  5) regardless of whether the turn card makes your draw, check on the turn … you may have to call any bet if multiple people stay in, and play the odds
  6) if you made the nuts on the river, checkraise all in (since you represented that you are protecting a flooped hand).
  7) there were also timing factors, people get more annoyed if you take too long. And you can play multiple tables online until you hit these kinds of starting hands
People said I was playing wrong but I intuitively felt like this would give a lot higher payoffs than just straight play with no bluffs.

PS: Against a really intelligent table where people don’t churn, you’d have to leave after a couple big wins and do the same thing elsewhere. The whole idea in poker is to take advantage of most people just executing some common strategy or mindset.


> People said I was playing wrong but I intuitively felt like this would give a lot higher payoffs than just straight play with no bluffs.

No winning players nowadays would advocate for a strategy that doesn't involve bluffing.

Not to be harsh, but just to point it out for any novices who might be seduced by the simplicity of what you propose: the "strategy" you outline is simply nonsense and would not be winning in the long run even vs a fairly weak field. Maybe vs the level of play in the pre-solver world it did okay (though I doubt it would win except vs the very weakest fields), but today it would simply be burning money, even at microstakes.

Some of the heuristics you describe work in some situations, but you make no mention of accounting for other players positions, other players ranges, or the cards on the board; it is simply based on your hand and your perception of the player populations tendencies (which have changed dramatically since the pre-solver days). Ignoring the majority of the publicly available information in a given hand, in this game of incomplete information, is a grave strategic mistake.


Cool tip thx. How did you do with this strategy?


I am the solver programmer mentioned in the article. I think it came out pretty decently. Maybe not all the details are right (I was late answering fact checking email) but I think it's a decent read and the author has done a good job!


How perfect is the play that piosolver calculates? Do you make any simplifying assumptions?

I noticed in the article that piosolver can incorporate a mixed strategy with multiple bet sizes. Pretty neat, is that a recent feature? As I rememember it, a few years back people would discuss what the optimal bet size was in various situations with the implicit assumption that there is just one.


Adding multiple betsizes was one of the first features we added in 2015 - so not very recent, no. The one true correct bet size is a seducing idea but that just doesn't exist for most situations. You can try finding the best one with the assumption you can only use one but it's not the best idea in my opinion. Main reasons are that that's the sizing most people will be most ready to play against and EV difference between possible bet sizes is minimal.

What happened in chess is now happening in poker: at first people were all about the best theoretical move and then gradual shift to "not necessary the best just sound enough but less likely to be analyzed by the opponents" started happening.

As to the first part of your question: the solutions are pretty much perfect. The only assumption is about possible bet sizes. As to accuracy we measure it in theoretical exploitability (how much a perfect adversary who knows our strategy exactly could win against us). You can easily go so low that even a theoretical perfect adversary wouldn't get close to beating the rake vs the solution, even at high stakes.


Congrats on putting this together, it's not easy! I'm curious, what papers have you used for your implementation? I know there are a couple of reimplementations of DeepStack out there.

Disclaimer: I'm doing phd in this area, generalizing to harder games than poker.


The solver was purely based on our (mine and my partner's) ideas. We tried reading some papers to find ideas to improve our solver but we had found that they are not very helpful - written in opaque language, using wrong tree representation and not focusing on practical implementation aspect. Maybe things changed since then but I haven't found anything useful at all in published papers about poker with the exception of two page one by Oskari Tammelin describing CFR+ algorithm (which we haven't used in our software because of its significant memory requirements).

We considered it very unlucky that the science paper was published as we thought more people will implement solvers then (ours was ready in mid 2014 and we released and a working solver with a GUI in early 2015, 3 months after the Limit Holdem paper). As it turned out though it didn't really matter much.

I had an idea to code a poker solver around 2008. Unfortunately I was yet to learn to code back then. The biggest challenge was to overcome implementation issues. Once I got the crucial part to be fast enough I knew the solver is possible. I think not having much background in the field allowed me to come up with more natural (and better) tree representation and memory efficient (even if not the fastest) algorithm as I was thinking more as poker player than a computer scientist back then.


Interesting! We have very recently wrote a paper on the various tree representations [1]. Maybe it is the same distinction you found on your own? I judge this based on the screenshots on your website. Public state trees are amenable to nice factorization over the hands, and indeed most implementations use it, but formally they write they use the history tree notation.

[1] https://arxiv.org/abs/2112.10890


It's a paper so it will take me a while to go through but that sounds more or less right from your description. The history tree notation and the whole information sets nonsense just make things very confusing to read and think about.


Congrats - PIOSolver is an amazing program. What are the future plans for it? It would be great if you didn't need a PhD to set it up though. I keep wondering if I'm actually training myself using the right solutions or not based on the setup :)


The plans are to start turning it into real company which hopefully means easier to use software, more learning material and better communication. Till now it was mostly two guys working from home and overwhelmed by it all.


2016 you said "It's a very small niche and getting less popular but we are still doing very well." Did this turn out to be true or has PioSOLVER become more popular again?


I think poker in general is slowly dwindling but somehow the solver in fact got more popular over the years. I think the pandemics helped us a bit and maybe general switch to nie analytical play. To be honest I don't understand it very well. We have never done any marketing, our website is terrible (put together on a free Shopify template and never really updated) but somehow it keeps getting more traction.


How did you end up working on this?

It’s an interesting history.


I was always interested in math and encountered game theory concepts early in life. It's a perfect mix of being not a very good programmer, not a top math mind, not an exceptional poker player but still being all those things at the right time and place.


I really like that you got to do this, and that you turned your weaknesses into such a great strength. Thank you for showing that you don’t need to be exceptional in a specific area to do exceptional work. Best of luck with whatever you decide to pursue next.


Great job! What makes PioSolver much more popular than the other solvers?


I like how the article gives examples of the sort of entropy players draw on to make their random choices:

> Koon will often randomly select which of the solver’s tactics to employ in a given hand. He’ll glance down at the second hand on his watch, or at a poker chip to note the orientation of the casino logo as if it were a clock face, in order to generate a percentage between 1 and 100.


The watch face is the classic one, the chip orientation is new to me. I like it, it's less obvious that you're randomizing something, and given how much poker players like to fidget with their chips the orientation is probably reasonably random.

Another classic is to use the suit of your cards, but that has problems with being correlated with the state of the board. It works fine preflop.


But it hasn't conquered it! I kept searching the article for some new recent breakthrough that I've missed but it's not there. Yes, solvers like Pio have been around for years and limit holdem has been essentially solved for a while but nobody plays limit holdem anyway.

The two most popular games (no-limit Texas holdem and pot-limit Omaha) are still unsolved.


Bots are superhuman in no-limit Texas hold'em. Libratus beat top humans in two-player in 2017 and Pluribus beat top humans in six-player in 2019: https://www.science.org/doi/abs/10.1126/science.aao1733 https://www.science.org/doi/abs/10.1126/science.aay2400

It's shocking that the reporter didn't mention these results or anything else more recent than 2015.


Have to say, Pluribus beating top humans for 10,000 hands isn't the same thing as being super human. It's just too small of a sample to make that claim.

Further, thousands of the hands that Pluribus played against the human pros are available online in an easy to parse format [0]. I've analyzed them. Pluribus has multiple obvious deficiencies in its play that I can describe in detail.

It seems like it's very difficult to set up any kind of proper repeatable and controlled experiment involving something as random as poker. Personally, I would be much more convinced if Pluribus played against real humans online and was highly profitable over a period of several months. This violates the terms of service / rules of many online poker sites, but it seems like the most definitive way to claim terms like "solved" or "superhuman"

[0] http://kevinwang.us/lets-analyze-pluribuss-hands/


Normally 10,000 hands would be too small a sample size but we used variance-reduction techniques to reduce the luck factor. Think things like all-in EV but much more powerful. It's described in the paper.


Where is the variance-reduction technique discussed? I looked at this paper

https://www.science.org/doi/abs/10.1126/science.aay2400

https://par.nsf.gov/servlets/purl/10077416

and it just says

> Finally, we tested Libratus against top humans. In January 2017, Libratus played against a team of four top HUNL specialist professionals in a 120,000 hand Brains vs. AI challenge match over 20 days. The participants were Jason Les, Dong Kim, Daniel McCauley, and Jimmy Chou. A prize pool of $200,000 was allocated to the four humans in aggregate. Each human was guaranteed $20,000 of that pool. The remaining $120,000 was divided among them based on how much better the human did against Libratus than the worstperforming of the four humans. Libratus decisively defeated the humans by a margin of 147 mbb/hand, with 99.98% statistical significance and a p-value of 0.0002 (if the hands are treated as independent and identically distributed), see Fig. 3 (57). It also beat each of the humans individually.


>The remaining $120,000 was divided among them based on how much better the human did against Libratus than the worstperforming of the four humans.

Surely the correct strategy here is for the human players to collude to give as much money as possible to a single player and then split the money afterwords, no?

Also, the fact that they players can only gain money without losing anything likely changes their play somewhat. By default I'd assume (and have generally observed) that most players on a freeroll (or better than a freeroll really) tend to undervalue their position and gamble more than is usually wise.

I'd definitely be interested in seeing a "real" game where the humans are betting their own money.


The four humans were getting $120,000 between them. Their share of that was dependent on how much better they did than the other humans. That means there was no incentive to collude.

Top pro poker players understand the value of money. They weren't treating it as a freeroll and anyone that has seen the hand histories can confirm that.


It's in the supplementary material of the 2019 paper: http://www.cs.cmu.edu/~noamb/papers/19-Science-Superhuman_Su... . Look at the "Variance reduction via AIVAT" section.


Do you think human players could use the results of this paper to learn how to be better poker players? I'm wondering if it could be an alpha go type situation where players learned different strategies.


Noam is being too humble, but he’s one of the primary creators of Libratus/Pluribus as an fyi.


The article was fascinating but honestly read like an ad for PioSolver


The journalist who contacted me told me he did so because the software keeps coming up when he talked to pro players. While it's certainly not one to advance the science the most (talk to Noam Brown if you want that), not the fastest (talk to Oskari Tammelin about that), it's still very popular and the first to get big following. It changed the game and got into the online poker culture. I am quite proud of that and I think it's deserved to be mentioned a lot in an article about how computers changed poker.


I know that PioSolver is not a "poker AI" per se, but the article seems to say it can tell you what to do based on the table situation. Has anyone tried pitting pro players against PioSolver?


PioSolver requires putting in the hand range of the opponent, so the quality of PioSolver's solution is largely down to how accurate the guess at that hand range is. But if a pro knows he is playing against PioSolver configured with a certain hand range he can just change his strategy to adapt. In theory though if PioSolver knows the correct hand range then it shouldn't be possible to any better than tie given enough hands.


There is still a lively academic community and major progress! Check out CMU's no limit results [1]. (I realize articles like this have to pick some angles to make it interesting, but it was weird to see only dated research mentioned.)

But if you are rooting against the machines, don't worry: it is almost certainly impossible to calculate a full equilibrium policy for no limit multiplayer, so we will instead be debating over the virtues of various types of imperfection for a long time. And even if an Oracle gave us convenient access to equilibrium strategy, it would still not be the optimum at a table full of imperfect players. Your poker game is safe for a while!

[1] https://www.nature.com/articles/d41586-019-02156-9


It doesn't even matter if you can calculate multiplayer equilibrium. It's not the solution the same way it is in heads-up. You can still lose if you employ the equilibrium in multiplayer unlike in HU.


That's not true in practice for poker. Pluribus showed that if you run CFR in multiplayer poker you get a solution that works great in practice. Multiple equilibria are certainly a theoretical issue for many games, but poker conveniently isn't one of them.


It's not about multiple equilibria but about (often unintended) collusion. Examples of that affecting poker games are very well known. One frequently occurring example was discussed in online community 15-20 years ago (BTN raises in a limit Holdem game, SB calls too much which hurts both the SB and the button giving equity to BB).

I don't think you're correct saying it doesn't affect poker as people were able to notice and analyze this before solvers. It's true though that no-limit Holdem as played today (two blinds,no ante,deep stacks) is likely not strongly affected by the phenomena. I don't agree Pluribus experiment shows much when it comes up practical play. Not enough variety of skill levels, not enough hands and not enough time for metagame (people adjusting to how others play) to develop. I do agree pure equilibrium play is most likely not terrible in cash game nlhe but definitely not in poker in general.


No limit holdem has been essentially solved. Pluribus & co not withstanding, you just haven't heard of it because the people who have solved it are busy printing money in online poker (yes, I know they try to detect bots, and no, they can't detect them all). With stakes this high, academic progress lags the 'actual' state-of-the-art by years.


IIRC it's "solved" for heads up but not really multiway like 3+ to the flop. I believe in a recent Bart Hanson Youtube he points out that mutltiway is not solved.


oh wait, Google showed me this:

> Machines have raised the stakes once again. A superhuman poker-playing bot called Pluribus has beaten top human professionals at six-player no-limit Texas hold’em poker, the most popular variant of the game. It is the first time that an artificial-intelligence (AI) program has beaten elite human players at a game with more than two players

https://www.nature.com/articles/d41586-019-02156-9#ref-CR1


It's not solved for multiway in the sense that the optimal move in each situation isn't known, but there are AIs like Pluribus that have superhuman performance.


Superhuman performance in a very constrained version of no-limit hold'em


Those constraints don't seem that strong to me. The algorithm could've just been retrained with different stack sizes if it ended up making a big enough difference.


Hasn't Facebook's Pluribus come pretty close to "solving" no-limit holdem? I don't know a lot about poker so I can't really assess their claim's validity.


Not even mentioning Pluribus is just bad journalism


The article is mostly about a change in the industry and adaptation of solver strategies by human players of all levels.

Pluribus had approximately zero impact on that.


Worth to note that PioSolver isn’t the only tool of its kind - there’s also Jesolver (http://jeskola.net/jesolver_beta/), which is speed optimized variant, and cloud-based Deepsolver (https://deepsolver.com) which leverages neural nets to do the job


<<But for Doug Polk, who largely retired from poker in 2017 after winning tens of millions of dollars, the change solvers have wrought is more existential. “I feel like it kind of killed the soul of the game,” Polk says, changing poker “from who can be the most creative problem-solver to who can memorize the most stuff and apply it.”>>

I wrote probably the first online texas holdem game, played on IRC, around 1994. Back then in rec.gambling blackjack teams were forming and instead of joining a blackjack team I started to play poker, because it was more fun than counting cards. I wrote the game so I could play against other people and was immediately a winning player in low limit games of the time in lake charles and vegas. However even in the 90s, it was about statistics.

I do regret not allowing people to play for money (people were constantly asking).

Even back then the primitive bots were mostly killing human players.


I'm pretty sure it was Vanessa Selbst who said that, not Doug Polk


I think there's room for a "battlebots meets online poker" product. Imagine buying your bot a real money budget and setting it loose in a tournament to combat other bots.


This has happened before, but only once that I know of. It was hosted by the creator of the (apparently now defunct) WinHoldEm "poker botting" software, Ray E. Bornert II. It never ran again, only a small handful of the more experienced bot authors attended, and turnout was poor, with the highest max buy in at $100 (http://robopoker.blogspot.com/2007/12/pokerbot-world-champio...). It was called the 2007 Poker Bot World Championship (PBWC).

The online poker community is extremely hostile towards poker bot software in general; you would be very hard-pressed to find an existing poker website that would be willing to encourage and host such tournaments.


Online poker has already been evolving in this direction. Any serious player who plays online will use player assistance tools (such as PIOsolver). Top level poker AI such as Pluribus has already proven that it can hold its own and beat the best online players such as linuslove. It is well known that online poker sites are full of poker bots now. There are programs to help users try to tell if they are playing against a bot by seeing how fast it reacts on each hand, for example if it always takes a fixed amount of time on each hand or if there is a pattern.


Yeah, that was the evolution of my thought process. Online poker is already full of bots, which will get more and more sophisticated over time and defeat bot detection hurdles -- so why not embrace it and have bot-only zones? You could have entire tournaments play out in less than a minute if all participants were software.


because humans are the worst players in most cases now. Especially at the lower limit tables. Obviously, you want to play the worst players to make the most money.


I would expect bots to add random delays? They could even determine real player delay distributions and emulate that.


yes, that is the obvious step that bot makers have taken. When bots were barely at the online poker scene, nobody cared to even check. Of course, there are still other ways to check for bots such as a user playing for an unreasonable amount of time or an extraordinary amount of tables, or simply not answering to chat.


The natural next step is adding ELIZA-like chat responses to your bots


"Tell me about want to kick my ass."


GPT-3 would be interesting, especially once multiple bots start chatting to each other.


I'm actually working on this as a side project. I've been dragging my feet as I wasn't sure anyone would want to "code to play" and had no real marketing plan.

It's a fun project though, especially trying to design in a way that proves I'm not manipulating the deck behind the scenes.

Email in profile if anyone is up for being an alpha player.


I've been thinking about a similar project. Do you intend to have it be only bots? I think it would be interesting to also allow human players to play against the bots (perhaps incentivized by getting preferential rakeback deals compared to the Bot players?). Many players fret about bots on other sites; it would be cool to enable those of us who want the challenge play them knowingly and with a slight financial handicap :)


I sketched out plans for the idea a while back (and have been too busy to do anything about it.) I'd have bot only games, and mixed games. I'd also have tools built into the game to let you pull up full stats on any player (there's add on software right now for major platforms that gives asynchronous advantage to players who use it, so to mitigate that it'd need to be adopted.) I think if the stats were good enough it'd be relatively easy to spot bots, which might make human-only tables possible also. The system would be punitive against bots which didn't declare themselves as such, but it would fully supportive of bots being part of the game on the online sphere.


Both could work. I’m particularly interested in the all bot variety because the pace can be very accelerated. For example, an idea I’m kicking around is having timeout be under 1 second.

I see it like HFT. You may have multiple bots running multiple playing strategies and an entire tournament could occur in minutes/hours. A large weekly tournament might even be like a sporting event for hackers.

The human vs bot scenario you’re interested in would be possible on the same platform with a longer timeout period.


I think the real money aspect would obviously get you some traction, but that's difficult to deal with (since you might run into online gambling laws, although you could very credibly argue that this isn't really any different from algorithmic trading). Maybe you could get in on the hype train and launch it as a crypto product?


There's a "Pokerbots" class/competition at MIT (during January aka Independent Activities Period), along with a robotics competition and a real time strategy video game bot competition. I did both the robotics and poker ones, which were a ton of fun!


But hasn't the problem been solved? Two bots would play perfectly and walk away with roughly 50% of the pot.

This article is about how humans have been memorizing the variations - akin to chess openings - in order to play as perfectly as possible.


No, because you have the randomness of the deck, and you have bluffing. Bots could be tuned to be aggressive or defensive, or switch strategies depending on conditions in the game.


Isn't that just algorithmic trading?


Classic games tend to be more fun than New York horoscopes. For me at least, there's something fundamentally profound about solving a logical puzzle better than anyone in history, that just isn't there in guessing which way some lines are going to go.


What teaching or training tools are out there for a very average player at no limit Texas hold’em who just wants to get at bit better to a respectable level at a modest time commitment , and does not need to be a pro-level player?


Yeah, really I just want an open-source poker solitaire that asks me what I should do and then shows me the odds after I answer.


The odds are a small part of what you have to consider.

If you want to quickly improve to a proficient level there's no quicker way than to read Theory of Poker and Hold 'em for Advanced Players. Both of these books focus primarily on limit poker, but the concepts are critical for no limit as well. And you'll realize there's a lot more strategy and nuance in limit poker than you thought.


ask and you shall receive: https://gtowizard.com/ (go to the "practice" section for what you are looking for)


Books (there's very few good ones)

Training videos

1-on-1 coaching

PioSolver (and other similar software)

For any of these to stick, you need to spend some amount of time studying by yourself; just consuming learning material and playing isn't enough.


Could you recommend a couple of books for a start?


Modern Poker Theory as mentioned by the other reply is good. I'd also suggest Play Optimal Poker by Andrew Brokos. The former outlines a lot of strategy in the game and talks a little about game theory and its implications. The latter gives less direct poker advice but uses "toy games" that share some similarities with poker but are simplified to demonstrate some of the implications that game theory has on poker. It does contain some direct poker advice but it's more of a book to teach you how to think properly about how game theory applies to poker and how you can use that to study with something like a solver and actually understand what you're seeing. I would suggest you pick up both, I think that the Brokos book assumes you know certain poker terms and doesn't contain a glossary whereas modern poker theory is almost like a textbook and has a section where it defines all of the terms it uses. Modern poker theory will have more actionable advice but the Brokos book is excellent to teach you how to think about game theory and requires more self-reflection and has more questions to the reader.

Applications of no limit by Janda or mathematics of poker are recommended by a different reply here. I would caution that these are extremely academic and extremely dense texts that would be a very tough read for a newer player. Mathematics is less practical and more of a math book than a poker book and applications is Janda showing how to work out solver-like solutions before solvers existed and also contains a lot of math. I think the other two above are more practical and aren't going to lead you to put the book down a 10th of the way through.


The Mathematics of Poker, by Chen & Ankenman - this one doesn't age because it's really fundamental if you want to understand things from the ground up. But it won't seem that practical at first.

Applications of No-Limit Hold 'em, by Matthew Janda - this one goes into specifics about how to think on each street, how to build a good strategy etc. Some of the details will be different from the latest theory because the book was released in 2013 but it's still a very solid read if you want to level up.


Modern Poker Theory by Michael Acevedo is considered a contemporary gold standard.


If you like video content, the Daniel Negranu Masterclass is extremely well done. Phil Ivey also has content on that site but I found it much less comprehensive.

After that, perhaps watch some Doug Polk on Youtube to see how he uses the concept of putting people on ranges of hands and what his thinking is in a specific spot.


Training sites are probably going to be your best bet to improve quickly. I see Run it Once was mentioned. I've personally used Upswing Poker in the past and they had some great resources. Bart Hanson's Crush Live Poker videos are great too but I can't vouch for the training course.


www.runitonce.com 25$/mo essential membership will give you enough from beginner to semi pro easily.


Lol, awesome. I built the Stripe integration for this site in 2013. The first day we launched, we landed over $100k in subscriptions.


Is there any remaining logic game which AI hasn't become superhuman? IIRC Monte Carlo tree search has done wonders in this area.


As far as I know, in Magic: the Gathering, the best bots are far worse than most players. Part of the difficulty is that the rules are so complicated that there are only a couple of complete rules implementations. Beyond that, it's an imperfect information game with far more actions per game than poker, so optimal-solver techniques haven't seen success.


I think this is purely a resource issue, e.g. if Google Brain decided to make an MtG bot I would be fairly confident it would be superhuman. Even real time strategy games like Starcraft are looking like they're on the cusp of superhuman bots (Alphastar was competitive as Protoss against elite players, but did not consistently beat them).


I doubt very highly it would be able to sit at a game of commander. Four players with 100 card singleton decks would be an absolutely enormous space to operate in.


Many players sounds like it's still a hard nut to crack for AI approaches (although as poker demonstrates its getting easier), but the deck size doesn't sound like it'd be the main issue.


Alphastar also didn't play with the same limitations that a human has. Even after removing its ability to see the entire map and finally forcing it to scroll around, alphastar never misclicks (so its APM==EPM) and can still blast nearly unlimited APM for short bursts as long as its "average APM" over an x-second period matched human's APM.

I believe Alphastar would generate more interesting strategies if we limited alphastar to a bit below human APM and forced it to emulate USB K+M to click (instead of using an API, which it currently does) and adding a progressively increasing random fuzzing layer against its inputs so that as it clicks faster the precision/accuracy goes down.

By "interesting strategies" I mean strategies that humans could learn to adopt. Currently its main strategy is "perfectly juggle stalkers" which is a neat party trick, but that particular strategy is about as interesting to me as 2011-era SC AI[0]. Obviously how it arrived at that strategy is quite interesting, but the style of play is not relevant to humans, and may in fact even get beaten by hardcoded AI's.

I'm also very curious what Alphastar could come up with if it were truly unsupervised learning. AIUI, the first many rounds of training were supervised based on high level human replays -- so it would have gotten stuck in a local minima near what has already been invented by humans.

This may be relevant if Microsoft reboots Blizzard's IP. I would love to have an alphastar in SC3 to play against off-line, or have as a teammate, archon mode, etc. I think all RTS' are kind of "archon mode with AI teammate" already. The AI currently handles unit pathing, selection of units to attack, etc. With an alphastar powering the internal AI instead, more tactics/micro can be offloaded to AI and allow humans to focus more on strategy. That seems like it would be super cool.

Examples: "Here AI, I made two drop ships of marines. Take these to the main base and find an optimal place to drop them. If you encounter strong resistance or lots of static defense, just leave and come back home"

"Here AI, use these two drop ships of marines to distract while I use the main army to push the left flank. Take them into the main, natural, or 4th base -- goal is to keep them alive for as long as possible. Focus on critical infrastructure/workers where possible but mostly just keep them alive and moving around to distract the opponent."

0: Automaton 2000 AI perfectly controls 50-supply zerglings (2.5k mineral) vs. 60-supply (3k mineral, 2.5k gas) siege tanks: https://www.youtube.com/watch?v=IKVFZ28ybQs


> Alphastar also didn't play with the same limitations that a human has. Even after removing its ability to see the entire map and finally forcing it to scroll around, alphastar never misclicks (so its APM==EPM) and can still blast nearly unlimited APM for short bursts as long as its "average APM" over an x-second period matched human's APM. > I believe Alphastar would generate more interesting strategies if we limited alphastar to a bit below human APM...

No Alphastar definitely had misclicks, and it had a maximum cap on APM regardless of average that was far lower than the max burst of APM (or even EPM) of top players. When I have the time I can go dig up some games where Alphastar definitely has misclicks, and I believe the Deep Mind team has said before that it will misclick. Its APM limits are already lower than pros both on average and in bursts (and are reflected in its play, Alphastar will often mis-micro units in larger, more frantic battles such as allowing disruptor shots to destroy its own units, but it will never make the same mistake with much smaller numbers of units).

> Currently its main strategy is "perfectly juggle stalkers"

Definitely not. That was its strategy in its early iterations against MaNa and is no longer feasible with the stricter limitations in place. Its Protoss strategy is significantly more advanced than that now (see its impressive series of games against Serral with an amazing comeback here: https://www.youtube.com/watch?v=jELuQ6XEtEc and a powerful defense against multi-pronged aggression here: https://www.youtube.com/watch?v=C6qmPNyKRGw) (and of course by "now" I mean when Deep Mind took it off the ladder). Both of these involve an eclectic mix of units with Alphastar effectively using each type of unit and varying it in response to what Serral puts out and its own resource constraints.

A lot of commentators have difficulty distinguishing Alphastar from humans when the former plays as Protoss (its Terran and Zerg play is weaker and often more mechanical).

> I mean strategies that humans could learn to adopt.

My main takeaways from watching Alphastar were "pros undervalue static defense and often have a less than optimal number of workers (where Alphastar's seeming overproduction of workers lets it shrug off aggressive harassment)," but I don't know if those have picked up in the meta.


Why is it far worse with Terran and Zerg?


I don't know enough to answer "what mechanisms of how the AI works would cause it to be worse than Terran and Zerg."

If the question is rather "what characteristics of Alphastar's Terran and Zerg play style make me say that its Terran and Zerg play is worse than its Protoss play," the simplest answer is that Alphastar just feels a lot more like a bot. Unlike when playing Protoss, it seems to get into certain "ruts" of unit composition and tactics that are a bad match for the opponent it's facing and can't seem to reactively change based on the game is going, whereas with Protoss it seems more than happy to change its play style over the course of the game based on what the opponent is doing.


Edit 2: Reading through the "supplementary data" of the 2019 paper, it definitely appears that the AlphaStar which reached grandmaster was not limited in the same ways as the 2017 paper would suggest. x/y positions of units are not determined visually, but fed directly from the API. So AlphaStar absolutely can just run Attack(Position: carrier_of_interest->pos.x) and not mis-click. It's "map" / "vision" is really just a bounding box of an array of every entity/unit on the map and all the things that a human would have to spend APM to manually check (precise energy level, precise hit points remaining, exact location of invisible units, etc). See [7]. DeepMind showed they have some fixed time delays to emulate human experience, if they had a position fuzzer, they would have mentioned it. I'm reasonably convinced they gave AlphaStar huge advantages even in the 2019 version that was 'nerfed' from the 2018 version. The 2017 paper was a more ambitious project IMO that didn't quite get fully developed.

Edit: 7 minutes after writing this I re-read the original paper[-1]. https://arxiv.org/pdf/1708.04782.pdf page 6 and 7 make it clear that DeepMind limited themselves to SpatialActions, so they cannot tell units "Attack Carrier" but have to say "Attack point x,y" (and x,y also has to be determined visually, not through carrier_of_interest->pos.x ). It's still not clear in the paper if any randomness is added to Attack(x,y).

Additionally, I have some serious concerns about assuming that the design decisions made in this 2017 paper were actually used in the implementation of the 2019 Alphastar demo vs TLO and MaNa. The paper claims "In all our RL experiments, we act every 8 game frames, equivalent to about 180 APM, which is a reasonable choice for intermediate players." I would agree with this choice! But [5][6] indicates that Alphastar's APM spiked to over 1500 APM in 2019! And even in moments when a human reaches that APM, their EPM would be an order or magnitude lower, whereas Alphastar's EPM matches its APM.

Original post:

Thank you so, so much for adding to the discussion! Would love to chat more about this if you see my reply and feel like it.

Regarding "mis-clicking", my understanding was that AlphaStar used Deepmind's PySC2[0][1], which in turn exposes Blizzard's SC2 API[2][3].

Here is the example for how to tell an SCV to build a supply depot:

  Actions()->UnitCommand(
    unit_to_build,
    ability_type_for_structure,
    Point2D(
      unit_to_build->pos.x + rx * 15.0f,
      unit_to_build->pos.y + ry * 15.0f
    )
  );
  
where unit_to_build->pos.x and unit_to_build->pos.x are the current position of the SCV and rx and ry are offsets. It's possible to fuzz this with some randomness, and indeed in the example, rx and ry are actually random (because the toy example just wants to create a supply depot in a truly random nearby spot, it doesn't care where). But the API doesn't attempt to "click" on an SCV and then use a hotkey and then "click" somewhere else. The API will never fail to select the correct SCV. It will also build precisely at the coordinates provided.

Point 1: Even if DeepMind added a fuzz to this method to make it so AlphaStar can "misclick" where the depot gets built, it cannot accidentally select the wrong SCV to build that depot. (Possibly wrong, as they could be using SpatialActions, see below)

Point 2: Most bot-makers wouldn't add a random fuzz to the depot placement coordinates to make their AI worse and I'd be super surprised if there was hard evidence somewhere that Alphastar had such a fuzz. (This is my main concern.)

My personal conclusion was that anything which looks like a "misclick" is, in fact, a "mis-decision". A human can decide "I want my marines to attack that carrier" but accidentally click the attack onto a nearby interceptor. I didn't think Alphastar could do that because I assumed it would use the Attack(Target: Unit) method instead of Attack(Target: Point) in that scenario -- and even if they used Attack(Target: Point) it would be used as Attack(Target: carrier->pos.x).

However, I realize now that they could be doing everything with SpatialActions (edit: it does, see paper[-1] pp. 6-7) (select point, select rect's)[4], and that they could have a implemented a randomness layer to make alpha star literally mis-click.

I suppose I would need to test this API and dive into the replay files to first see if its possible to discern the different between Attack(Target: carrier_of_interest) and Attack(Target: carrier_of_interest->pos.x). Then, even if Alphastar is using the latter, it's still not clear that there's an additional element of randomness outside of the AI/ML control.

Has anyone already done an analysis of the replay files on this level, or has DeepMind released hard info on how they're controlling the bot?

-1: https://arxiv.org/pdf/1708.04782.pdf

0: https://www.youtube.com/watch?v=-fKUyT14G-8

1: https://github.com/deepmind/pysc2

2: https://github.com/Blizzard/s2client-proto

3: https://blizzard.github.io/s2client-api/index.html

4: https://blizzard.github.io/s2client-api/structsc2_1_1_spatia...

5: https://www.alexirpan.com/2019/02/22/alphastar.html

6: https://deepmind.com/blog/article/alphastar-mastering-real-t...

7: https://ychai.uk/notes/2019/07/21/RL/DRL/Decipher-AlphaStar-...


I unfortunately don't have the time to look at the papers in detail (I could totally see how a lot of what I observed could happen even without intentional misclicks so I do take that back), but I want to point out that January 2019 Alphastar (in exhibition matches against TLO and MaNa) is significantly worse than Fall 2019 Alphastar. Alphastar changed very markedly between those time periods.

If you look at an Alphastar Protoss game from the latter half of 2019, it's not relying on cheap tricks to win (such as the impossible stalker micro). Nothing it's doing leaps out as superhuman. Instead it just grinds down its opponent through a superior sense of timing and macro strategy. The two games I linked against Serral it wins by punishing when Serral overextends his reach or by altering its unit composition to better fit what Serral throws at it, rather than some ungodly micro. Nothing it's doing there couldn't be done by a human. In fact I would say in most of the battles, Serral's micro was better than Alphastar's.

Now it's also worth pointing out that Serral is playing on an unfamiliar computer, rather than his own, so there's a bit of a handicap going on and even Alphastar Protoss will still lose to humans, so it's not superhuman, but it's definitely an elite player and its play style is very difficult to distinguish from that of an elite player.


The search tree is huge in mtg. It has to be the largest of any game. You can take actions all the time. There are triggers all the time, you can stack your actions on top of your opponent actions. Huge space really.

And then of course it's also imperfect information both in the sense of your opponent hand but also his deck. The cardpool is also very large for some formats.

I actually don't think it's solvable just by throwing MCTS at it with todays hardware but would love to know more about this, if someone else has more insight please reply.

EDIT: Oh and there is also the meta-game / deck building aspect. If you are going to win a tournament you have to have favorable matchups against most players in the room.


Search space size is no longer a great heuristic of how difficult a game is for the latest in AI approaches. For example, an RTS game has an absolutely enormous search space as well (effectively every unit of several hundred can move in every direction for every single tick of the game clock, many units have spells and many spells are meant to stack with other spells) and Alphastar is a convincing demonstration that this is not out of the reach of current AIs. And you similarly have imperfect information where you don't know what your opponent is doing unless they are sufficiently close to your current units.

Even the meta-game/deck building aspect doesn't seem all that insurmountable as it doesn't seem fundamentally different from say a build order other than that it cannot change dynamically on the fly.


The search tree size isn't an insurmountable issue. For exple OpenAI 5 managed to play DotA (though admittedly relied heavily on perfect timing and the ability to look at every enemy on the minimap at the same time, and lost a lot when people learned to play around the only strategy it knew).I think it'd be feasible to get an AI to play a modern deck.

I think it'd be much harder for the AI to do deck building in a vacuum. You could model it as every game starting with you building your deck, but I can't see that converging stably.


MCTS is usually paired up with Deep Learning. This doesn't appear to have any problems with games with even larger branching factors. Look up AlphaZero and AlphaStar.


An MTG limited player named Ryan Saxe created an AI that drafts and builds deck to a highly successfully degree: https://github.com/RyanSaxe/mtg

He was able to reach Mythic, the highest ranking tier on Magic Arena. Of course this is a different problem to actually playing the game (and probably significantly easier). That being said, this is one guy doing it as a side project with restricted resources.

I imagine that MTG could played quite successfully by an AI if someone where to dedicate the resources. Imo much of the difficulty is in laying the groundwork. Large amounts of data don't exist publicly and laying the framework for a bot to play itself would be quite difficult (and then the computation costs would be extremely expensive).


Wouldn’t the fact that the game is constantly changing, with new cards being added and old ones being removed, also make it harder to solve?


That's harder on humans than on computers. Computers have perfect information on what the legal cardpool is at game they are playing


Have people tried using GPT-3 for this?


Here's a recent paper from DeepMind about their recent efforts towards generalized game solving (caveat: I haven't actually read it yet).

https://arxiv.org/pdf/2112.03178.pdf

Abstract:

"Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games — an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold’em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning"


Imperfect information games that require agents to cooperate with each other haven't been solved yet. (multiagent cooperation is still a hard problem) Hanabi is a big(ish) research area. Contract bridge has had some work done on it.


I'm surprised that Hanabi is such a huge research area. Compared to bridge or poker, it seems like a far more simple game.


> I'm surprised ... it seems like a far more simple game.

This probably shouldn't surprise you. If you are researching how to attack a new/difficult class of problems, typically you look for the simplest versions of that class first.


I would agree with that. I just tended to assume that Hanabi would be a solved problem by now.


Bots are superhuman in self-play Hanabi: https://ai.facebook.com/blog/building-ai-that-can-master-com...

The remaining challenge is getting it to play well with human partners. Doing that requires modeling human conventions rather than learning weird bot conventions. That's hard because while you can collect essentially unlimited data through self play, it's hard to collect a lot of data playing with humans using reinforcement learning. AI algorithms are really bad at sample efficiency.


Contract Bridge is still untouched. I'm not sure if it's because of lack of resources, but it has a lot of difficult elements: cooperation, huge search trees due to imperfect information, and information theoretic aspects of bidding.


For some time I've been pondering about creating a multiplayer poker AI that would (1) play reasonably well but not at a superhuman level (2) not require expensive hardware.

Frustratingly, this seems to be an unsolved problem. Pluribus is only 6-max, requires fixed starting stack sizes and requires a lot of hardware.


I mean, it may not have been publicly solved, but there are definitely bots out there in the wild that are beating human fields on online poker sites.

But also I think Pokersnowie is a bot made with more traditional ML methods (i.e. training on hand histories rather than solving an entire game tree with CFR), which basically fits the description.


Is there enough public info to reimplement PokerSnowie? I only know it uses neural nets in some way.


Does this mean that online poker is effectively over?


It's kind of dead-ish because there aren't big edges and most people who play online are decent enough. Online poker has been deteriorating for years (with maybe a small uptick due to the pandemic). But this article could have been written a few years ago, there's nothing new in it.


Ever since the US outlawed it it’s been dead-ish


Chess has been "solved" by computers for a long time now, yet the online Chess community is bigger than ever. I don't see why it would be any different for Poker.


Big difference with poker is that its all about money, the game doesnt work if people arent risking something. And if computers are superior, the game breaks down.


You are worried that people will cheat in online poker, or what?

For poker, people can still play face to face, and it being a game of mixed strategies means it's probably a lot less important that poker bots are better than people than in chess. Top chess players use the computers to set up preparation bombs on their opponents, with rebuttals to moves for specific positions created by the computers. That doesn't work in a game of imperfect information and mixed strategies.


>You are worried that people will cheat in online poker, or what?

The norm for online chess is to not play for money, whereas the norm for online poker is to play for money, often for large sums.

Of course the game with higher stakes is more likely to attract cheaters.


> whereas the norm for online poker is to play for money, often for large sums

That is definitely not the norm. Most people are playing recreationally. Heck it is illegal to pay poker online with real money in 44/50 states in the US.


Yeah, but playing poker for not real money is a very different game. Players have no incentive to not take huge risks, so it is a completely different game.


And how do you suppose they enforce that?


> Big difference with poker is that its all about money, the game doesnt work if people arent risking something.

I don't see this, I used to really enjoy online poker tournaments for play-money. It's a fun and strategic game in and of itself with nothing more than minor bragging rights on the line, to me at least. I honestly don't see why it should be any different from chess in terms of purely recreational play. (You could say chess is a deeper game, and that's probably true in some ways, but that's not a positive feature for casuals like me with no desire to invest many hours/years into mastering the game.)


Poker is not really all about money. Plenty of games might only ask you to stake like $5 but the joy and aim of the game is really to outplay the table, not take home $25.


For the purpose of AI "conquering" poker the money makes a huge difference.

Would you play the same way if you were betting your life savings versus $5?


No, but my play would deteriorate at the high stakes. A computer would play the most optimal strategy it can regardless.


The difference for poker is it takes a lot more play time to sort players by skill. There's no ELO to climb and thus you don't get to play against more challenging opponents.

e.g. I'll lose 100% of my matches against a Chess GM but against the best poker player in the world I could go all in preflop every hand and still win the match ~20% of the time.

Only way to counter this variance is to play lots and lots of hands (Law of large numbers) It could take weeks/months in a single matchup to determine the better poker player.

In terms of playing for real money, in general you'll find more challenging competition at higher stakes. But then there's an incentive for bots.


Unlike with chess, in the poker world there's a huge financial incentive to run bots (or provide bot assistance to live players).


There is some misconception about the optimal strategy in the article. You can only win in poker if you recognize how your opponent deviates from the optimal strategy and then play a strategy to exploit him. You have to play an exploitable strategy yourself to exploit someone else.

Think about playing rock paper scissors agains an opponent who always chooses rock. If you still play the optimal strategy (choosing randomly) you gain nothing...


> You can only win in poker if you recognize how your opponent deviates from the optimal strategy and then play a strategy to exploit him.

This is not true. If you play optimal strategy, you will win against any opponent except one that plays optimal as well, in which case you'll break even. But, of course you'll win a lot more if you are able to adapt your strategy to exploit any weaknesses that you have detected.

And detecting weaknesses in your opponent(s) in live play is something that humans will remain better at than AI for quite a while. Because it requires not just careful analysis of your opponent's actions but some contextual information as well. E.g., how old is your opponent, is he experienced or not, is he drunk, have you experienced similar players like hime before, etc.


This is almost correct but not quite. Optimal strategy wins against many other strategies and breaks even against others. The ones it breaks even against are not necessarily optimal themselves, they just don't make mistakes vs the optimal one but might be exploitable themselves.

To be more precise: you only need to replicate not mixed (pure) plays of the optimal strategy to not lose against it. Your frequencies for mixed actions can be completely off though.


In theory you are right but when talking specifically about poker, I can't think of any example of imperfect play that isn't exploited (to a certain extent) by optimal play. Yes, playing the opponent would win more, but optimal play makes a (smaller) profit as well. Or would you know an example?


The imperfect play is in the mixing. Imagine you play like optimal but every time you have a mixed action of bluff or check you bluff. You won't be exploited by an optimal strategy but you will be very vulnerable vs someone who bluff catches a lot.


> If you play optimal strategy, you will win against any opponent except one that plays optimal as well

How does the example you're responding to not win 50% of the time?

Rock v Rock = Tie

Rock v Paper = Loss

Rock v Scissors = Win

The optimal game theoretic play of randomly choosing rock paper scissors is inferior play against this particular opponent. All that game theoretic perfect play gets you is the benefit of getting at least a tie. Possibly more, but not always more for non-perfect play.


Poker isn't like this. Think rock, pair, scissors, crap where rules are that crap always loses. Now uniform strategy between rps will yield profit vs an opponent who plays crap sometimes.

It's very easy to play crap in poker


Right, but so what?

If the optimal strategy is a 50-50 split against someone else playing the optimal strategy and at least a 50% win against someone who is not, it doesn't follow that if they other player plays crap every game that the optimal strategy is best against them.

Identification of the crap player means that you increase your bets vs what you would normally bet for perfect play. Perfect play is bullet proof, but it isn't necessarily maximal yield against a non-perfect player even if thats the rough trendline.


The point being that in poker and in particular Texas Holdem - unlike in RPS - the optimal strategy will win against good players and completely obliterate weak players. Yes, the good human can win even more against bad human but make no mistake - it's extremely hard to not lose a lot vs optimal strategies.

>Identification of the crap player means that you increase your bets.

That's not true. There are a lot of different ways to exploit bad players depending on how bad they are. Also a lot of weak players these days have much more subtle leaks (e.g. never bluff-raise the river) and it takes some skill to spot and exploit them.


I reread the entire chain and realized I misinterpreted the context around what you said and thus came to the wrong conclusion of your understanding and meaning.

> > You can only win in poker if you recognize how your opponent deviates from the optimal strategy and then play a strategy to exploit him. > This is not true.

You were right here when you said "This is not true." I still think there is more nuance than the discussion as a whole credits, but overall I think you were more right than wrong in our discussion and I was more wrong than right. Thanks for the chat.


I may be ignorant but I still don’t understand the fuss about this feat. I mean poker is all about probability, right? With uncertainty about bluffing and fraught with emotional control problems. So naturally, a machine is best suited for this kind of problem, but not human. The bluffing problem can also be solved with heuristic optimization techniques like GA. Thus, is it not obvious that machines/algorithms will beat humans in the long run?


The distinction is that the number of branches that manifest in games of incomplete information can become utterly massive, and then you add in 6-10 players at once, all playing at world class levels, and you have a real computation problem.

I mean maybe this is a trivial thing to solve for you, in which case kudos, but its pretty neat for a lot of onlookers(and participants).


This whole thing is a misconception about poker. Poker is never about calculating odds in your head, but bluffing convincingly and reading the opponents tells. Computers have no tells, and I doubt if any ML can learn to recognize tells of people, since they are very subtle.

The statistics part anyone with basic math skills can figure in their head - that doesnt guarantee any win.


> he opened a computer program called PioSOLVER, one of a handful of artificial-intelligence-based tools

So I checked out this tool, and the team describes themselves as "programmers interested in algorithms"[0] ... what is the difference between A.I. and algorithms?

[0] https://www.piosolver.com/pages/about-us


The taxonomy I personally use is:

* Every computer program has algorithms, it's an extremely general term for "the idea behind how the computer will solve the problem". Advanced algorithms are typically those that took a lot of human effort to come up with.

* Machine Learning refers to a specific class of algorithms where the computer automatically figures out (part of) what it should do based on data.

* Deep learning is the subset of ML that uses deep neural networks.

* Artificial Intelligence is a marketing term, and is actually about the _problem_ being solved, not the technology being used to solve it. In particular, AI is any computer program that solves a problem people would previously expect can only be solved by a human applying creativity and/or intelligence, like playing a game, understanding natural language, or creating artwork. This definition is obviously a moving target as expectations change.

Deep learning is currently the most powerful and general toolkit for solving AI problems, so the concepts tend to get mixed together pretty frequently, but I personally like using the definitions above to keep things straight.


I like your taxonomy. But just to clarify: is the cruise control on my car "artificial intelligence?" Because a lot of dumb 15-year-old teenagers don't know they have to increase power while going up a hill, while my cruise control algorithm does.


My category for AI is based on "popular perception", not necessarily an actual comparison to what humans can or can't do. Most humans don't know how to multiply complex numbers, but that doesn't mean doing so demonstrates human-level intelligence and decision making.

In the specific case of cruise control, I don't think maintaining speed accurately is typically viewed as "difficult" for a computer today, so it would be deceptive to refer to as AI. 100 years ago that may have been different.


I like your framework.

Would you put statistical analysis "algorithms" in the same category as "Machine Learning"?


Sure, "classical" ML of various forms (knn, decision trees, etc.) are Machine Learning, but aren't Deep Learning. That's exactly why the distinction between those phrases is useful.


Clearest and most useful descriptions I've seen by far. Thanks for sharing


It just sounded like a good description. We never got any funding and started the business as a side project so no need for big words :)


Algorithms are provably correct. If you've got no proof of correctness, the best you have is a heuristic.


Depends on whether you use the traditional or vernacular definition of AI. Traditionally it was more about the how not the what. AI was an agent acting rationally in an environment. Pac-Man guided by Dijkstra? That's AI, there's an agent, an environment, and said agent is acting "rationally".

In modern vernacular it has become a synonym for machine learning or sometimes specifically neural networks.


The amount of VC funding you'll get.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: