Hacker News new | past | comments | ask | show | jobs | submit login

Essentially all AI work I've seen in games aims for game theory optimal play, but I think it could be really interesting to consider AI for exploitative play. Does this exist? Poker with imperfect information, human pressure and fallibility means that players will inevitably stray from Nash equilibrium. The decision on how to exploit without getting exploited back oneself seems really fascinating to consider from an AI perspective. At a glance it seems to require considering how others view you..



Within solvers, you can do something called "node locking", which means you "lock" a tree in the game node to play a fixed strategy. You would typically lock it to play as you suspect your opponent plays. This lets the solver calculate the optimal exploitative solution against your specific oppoents.

Piosolver, the first public solver and the one mentioned in the article, has this feature.

However, what often happens is if you lock one node, then several other nodes in the game tree over-adjust in drastic ways, forcing you to lock all of the, which may be infeasiable. As a result, Piosolver recently introduced "incentives", which gives a player in the game an additional incentive to take a certain action . For example, you may suspect your opponent calls too much and doesn't raise enough, so you can just set that incentive and it will include that in its math equations and give you something similar to an exploitative solution with a much simpler UX.

This feature was literally just introduced a few months ago so it's still very much an active area of research, both for game theory nerds, and people trying to use the game theory nerd research to make money !


I want to see strong AI used in video games, especially strategy games. People often retort that strong AI is not fun; it's too challenging and that's not what players want. But once we have a strong AI we can adjust its goal function in fun ways. What you're describing is effectively the same, and it's the first time I've seen this used in a strong AI.


GT Sophy is now a permanent feature in Gran Turismo and basically does this!


Thanks for bringing that to my attention.

I still haven't seen an AI for a turn based strategy game. There's AlphaStar, but it wins via APM, not strategy.


In essence, you need strong (probably unbeatable) game AI in order to make more interesting weak (beatable but challenging and fun) game AI.


An AI that plays a fixed exploitative strategy will end up getting figured out relatively quickly and counter exploited pretty hard. This actually happens in real life sometimes when people attempt to deploy poker bots online.

Any exploitative AI also needs the ability to adjust in real time to a different exploitative strategy, which also needs to be not easily predictable, etc.


Yes, this exists! Look up models based on counterfactual regret minimisation - they learn to exploit regularities in their opponents play, and often stray from the GTO play when it makes sense. I believe they have beaten poker professionals in thousands-of-hands playoffs but I may be misremembering.


> I believe they have beaten poker professionals in thousands-of-hands playoffs

I really don't know anything about poker AIs but could it be you are referring to Libratus and/or Pluribus[0]?

[0]: https://noambrown.github.io/


To be honest I didn't have a specific AI in mind, more the technique, but it sounds like these ones also use a variant of CFRM.


I do not believe these attempt to exploit regularities.


no, CFR is mainly just a way of computing Nash equilibria and (although in some sense it is an online, iterative algorithm) would typically be used to precompute Nash strategies, not update them in real time. real poker playing systems augment the CFR strategies with some real-time solving, but just to get even closer to Nash at the end of a hand.

on top of this, you could think about augmenting these systems to exploit weaknesses in opponent strategies. there is some work on this, but I don't think it's done much. The famous systems that played against professionals don't use it, they just try to get as close to GTO as possible and wait for opponents to screw up.


Hmm, I see, thanks for the reply. My mistake - I watched an interview (that I can't find now, ugh) with a poker player who played against one of the top CFRM bots and claimed that it felt like it was adapting to his playstyle.

But it sounds like that must have been either misunderstanding or some other part of the bot's algorithm I guess.


...in case you would be willing to share some knowledge - what exactly is a GTO play in poker? Does it mean a Nash equilibrium strategy? Something else entirely?

Whenever I search this stuff I get practical poker strategy guides, but none of them seem to define the term haha


Two player poker is a zero sum game, where GTO play is very well-defined as just playing a Nash equilibrium strategy. The solvers try to get as close as they can to that.

Life is a lot more complicated in multiplayer poker. There are Nash equilibria, but potentially many with different payoffs, and you can't force your opponents to choose the one you're aiming for. So in that case, it's not so obvious what "optimal" means.

As for CFR adapting to opponent play: CFR could bias its compute resources towards really finely optimizing strategies for the most likely scenarios facing certain players, and it seems like this has been done during poker tournaments.

But within those situations, it would still be trying to more perfectly approximate the Nash strategy, vs. more experimental approaches which actually choose a different strategy to exploit opponent weaknesses.


Gto poker = whatever solvers say.

Common expression is "deviate from GTO" where you know what the solver would do but decide to play differently.


I see, thanks, so I guess it depends on what the solver is actually doing.


The obvious follow up question: are there methods in use to bait such models into suboptimal play and then switch play style to exploit that?


> Imperfect information, human pressure... inevitably stray from Nash equilibrium

Human pressure yes. Imperfect information no.

When we talk about Nash equilibrium for a game like poker, it's already based on imperfect information.


Ah right, thanks


it took me a while to track this down last month: https://codegolf.stackexchange.com/questions/tagged/king-of-... there's also cops and robbers and at least one other "all AI compete against eachother" with the submitter usually making the first couple of "naive" bots.


very interesting share ^ !


I think it'd be interesting to see if an AI with visual input playing exploitatively can out perform AI playing GTO. In doing so, we can measure the effect of visual tells.


You mean, can the exploitative strategy take money from fish faster? Yes. But it doesn't need to care about visual tells.

The point of the optimal strategy is that it's unexploitable so you can disregard the other player's actions (in the game or outside it) entirely.

All exploitative strategies are in turn exploitable.


I think you have to be careful with saying stuff like "optimal strategies are unexploitable", because it usually means "unexploitable in a particular game theory sense".

Whether the assumptions of the Nash equilibrium (or any of the others) make sense for your situation in a game of poker is an empirical question, right? It's not a given that playing a NE means you'll be "perfect" in the human sense of the word, or that you'll get the best possible outcome.

The best superhuman poker AIs at the moment do not play equilibriums either, for instance.


I agree that because of, for example rake or table fees for cash games or competition structure for tournament in practice a game theoretically optimal choice may not be the right choice in practical play.

However the situation with an AI powered competitor which uses exploitative play is identical to a human, the GTO play will gradually take their chips at no risk.

It's not that they're optimal but that they've chosen not to be optimal and so that's why they lose money against GTO.

The AI is at least unemotional about this, humans with a "system" easily get tilted by GTO play and throw tantrums. How can it get there with KToff? What kind of idiot bluffs here with no clubs? Well the answer will usually be the one that's taking all your chips, be better. Humans used to seeing exploitable patterns in the play of other humans may mistake ordinary noise in the game for exploitable play in a GTO strategy and then get really angry when it's a mirage.


Right, I see what you're saying, but this is what I'm disputing - in two player games, what you wrote is true, but those properties of Nash equilibria don't generalise.

When there are more players, there can be multiple Nash equilibria, and (unlike the two player case) combinations of equilibrium strategies may no longer be an equilibrium strategy. So it's no longer true that you cannot be exploited, because that depends on other player's strategies too, and you cannot control those.

(See this paper for instance: https://webdocs.cs.ualberta.ca/~games/poker/publications/AAM...)


Yes, I agree that more players makes the theory at least extremely difficult and perhaps imponderable. That paper was interesting, thanks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: