Essentially all AI work I've seen in games aims for game theory optimal play, but I think it could be really interesting to consider AI for exploitative play. Does this exist? Poker with imperfect information, human pressure and fallibility means that players will inevitably stray from Nash equilibrium. The decision on how to exploit without getting exploited back oneself seems really fascinating to consider from an AI perspective. At a glance it seems to require considering how others view you..
Within solvers, you can do something called "node locking", which means you "lock" a tree in the game node to play a fixed strategy. You would typically lock it to play as you suspect your opponent plays. This lets the solver calculate the optimal exploitative solution against your specific oppoents.
Piosolver, the first public solver and the one mentioned in the article, has this feature.
However, what often happens is if you lock one node, then several other nodes in the game tree over-adjust in drastic ways, forcing you to lock all of the, which may be infeasiable. As a result, Piosolver recently introduced "incentives", which gives a player in the game an additional incentive to take a certain action . For example, you may suspect your opponent calls too much and doesn't raise enough, so you can just set that incentive and it will include that in its math equations and give you something similar to an exploitative solution with a much simpler UX.
This feature was literally just introduced a few months ago so it's still very much an active area of research, both for game theory nerds, and people trying to use the game theory nerd research to make money !
I want to see strong AI used in video games, especially strategy games. People often retort that strong AI is not fun; it's too challenging and that's not what players want. But once we have a strong AI we can adjust its goal function in fun ways. What you're describing is effectively the same, and it's the first time I've seen this used in a strong AI.
An AI that plays a fixed exploitative strategy will end up getting figured out relatively quickly and counter exploited pretty hard. This actually happens in real life sometimes when people attempt to deploy poker bots online.
Any exploitative AI also needs the ability to adjust in real time to a different exploitative strategy, which also needs to be not easily predictable, etc.
Yes, this exists! Look up models based on counterfactual regret minimisation - they learn to exploit regularities in their opponents play, and often stray from the GTO play when it makes sense. I believe they have beaten poker professionals in thousands-of-hands playoffs but I may be misremembering.
no, CFR is mainly just a way of computing Nash equilibria and (although in some sense it is an online, iterative algorithm) would typically be used to precompute Nash strategies, not update them in real time. real poker playing systems augment the CFR strategies with some real-time solving, but just to get even closer to Nash at the end of a hand.
on top of this, you could think about augmenting these systems to exploit weaknesses in opponent strategies. there is some work on this, but I don't think it's done much. The famous systems that played against professionals don't use it, they just try to get as close to GTO as possible and wait for opponents to screw up.
Hmm, I see, thanks for the reply. My mistake - I watched an interview (that I can't find now, ugh) with a poker player who played against one of the top CFRM bots and claimed that it felt like it was adapting to his playstyle.
But it sounds like that must have been either misunderstanding or some other part of the bot's algorithm I guess.
...in case you would be willing to share some knowledge - what exactly is a GTO play in poker? Does it mean a Nash equilibrium strategy? Something else entirely?
Whenever I search this stuff I get practical poker strategy guides, but none of them seem to define the term haha
Two player poker is a zero sum game, where GTO play is very well-defined as just playing a Nash equilibrium strategy. The solvers try to get as close as they can to that.
Life is a lot more complicated in multiplayer poker. There are Nash equilibria, but potentially many with different payoffs, and you can't force your opponents to choose the one you're aiming for. So in that case, it's not so obvious what "optimal" means.
As for CFR adapting to opponent play: CFR could bias its compute resources towards really finely optimizing strategies for the most likely scenarios facing certain players, and it seems like this has been done during poker tournaments.
But within those situations, it would still be trying to more perfectly approximate the Nash strategy, vs. more experimental approaches which actually choose a different strategy to exploit opponent weaknesses.
it took me a while to track this down last month:
https://codegolf.stackexchange.com/questions/tagged/king-of-...
there's also cops and robbers and at least one other "all AI compete against eachother" with the submitter usually making the first couple of "naive" bots.
I think it'd be interesting to see if an AI with visual input playing exploitatively can out perform AI playing GTO. In doing so, we can measure the effect of visual tells.
I think you have to be careful with saying stuff like "optimal strategies are unexploitable", because it usually means "unexploitable in a particular game theory sense".
Whether the assumptions of the Nash equilibrium (or any of the others) make sense for your situation in a game of poker is an empirical question, right? It's not a given that playing a NE means you'll be "perfect" in the human sense of the word, or that you'll get the best possible outcome.
The best superhuman poker AIs at the moment do not play equilibriums either, for instance.
I agree that because of, for example rake or table fees for cash games or competition structure for tournament in practice a game theoretically optimal choice may not be the right choice in practical play.
However the situation with an AI powered competitor which uses exploitative play is identical to a human, the GTO play will gradually take their chips at no risk.
It's not that they're optimal but that they've chosen not to be optimal and so that's why they lose money against GTO.
The AI is at least unemotional about this, humans with a "system" easily get tilted by GTO play and throw tantrums. How can it get there with KToff? What kind of idiot bluffs here with no clubs? Well the answer will usually be the one that's taking all your chips, be better. Humans used to seeing exploitable patterns in the play of other humans may mistake ordinary noise in the game for exploitable play in a GTO strategy and then get really angry when it's a mirage.
Right, I see what you're saying, but this is what I'm disputing - in two player games, what you wrote is true, but those properties of Nash equilibria don't generalise.
When there are more players, there can be multiple Nash equilibria, and (unlike the two player case) combinations of equilibrium strategies may no longer be an equilibrium strategy. So it's no longer true that you cannot be exploited, because that depends on other player's strategies too, and you cannot control those.