At first one would think that these games revolve around body language, but really, they're mainly about keeping track of a complex sequence of choices made by others and tracking (and paring down) a set of possible truths based on those choices.
This is hard for humans to do, and is made interesting by the ample opportunities for intuitive leaps, but I'm not at all surprised that a program would be better at it. It's very easy in these games to incriminate yourself in ways that might go unnoticed by human players but would be incredibly obvious to an agent that's keeping a perfect record of everything that's happened.
I think if I had guessed, I'd have thought this particular hidden role game was easily beaten with AI since the meaningful choices are so limited. Still, it was worth actually proving. I suspect something like one-night werewolf is unbeatable by ai since it's so heavily based on communication, and there's only one a single vote in the game, though..
ONUW harder than Avalon? Yeah I don't think so. While Avalon has ~15 team selections (for something like an average of 60 picks) plus over 100 yes/no votes, and ~20 fail/success votes... ONUW has pretty much no choices at all.
>one-night werewolf is unbeatable
You're asking the wrong question for starters. The AI does need to "beat" the game. It is being measured on whether it's better at the game than humans. And my experience is that most One Night games devolve into a 50/50 guess. An AI would be better at calculating the odds of the two proposed game states that the players can't actually differentiate. So the AI would surely trivially win more than human players.
Using language is one of the team's future aims. I don't think it would be impossible, since outside of communication for reasoning / deduction purposes, every in-game action can be described in an obvious and unambiguous manner.
Please let me know if I’m mistaken but it seems that, disappointingly, there is no measure of the quality of players, especially considering that the game referenced is a fan-made clone of the physical game. There’s no ladder, MMR, ELO, ec.
So, of course someone who’s played the game 100,000 or even 10 times is likely to beat someone who’s only just played maybe a handful of games. That’s not a bold - or arguably fair-claim to have beaten a majority of players if those players are all brand new —or a minority of those are veterans— which I suspect is likely given the nature of web based games.
This is interesting in itself. 10 years ago it would have been very hard to write a program beating a human at a game where we don't know what makes a player better than another.
We do know what makes a player better, it's win rate. What GP is saying is there seems to be no leaderboard or ELO/MMR to really understand how good the AI actually is at the game.
Is this achievement closer to AlphaGo beating Lee Sedol, or is it more like AlphaStar beating a bunch of Bronze league players?
On the other hand, I would bet that this is a small dedicated community. Much like mafia game communities, I would expect this is a dedicated fan group
In the 90s, all games were human-mastery only. Even writing a strong checkers AI or Connect4 AI was considered a major accomplishment. Only Tic-Tac-Toe was really bot-perfect.
With AIs winning in Chess, Go, Poker, and more... I guess the kids these days have forgotten how things were like just a few decades ago.
Contests judged on aesthetic merit may be the last bastion of human advantage. Boxing and figure skating are judged like this. Chess and Go aren't, but most commentators seem to find the machine's moves ugly, so perhaps there'll someday be a category of chess judged for elegance and style.
As the other poster has listed: the superhuman Poker AI is already built. They extend the poker-AI to play another game here.
Actually, CFR is really interesting, because it calculates an estimation to the Nash Equalibrium and plays theoretically close to optimal in randomized strategies. Its a straightforward concept from a game-theory perspective, but the "estimation to Nash Equalibrium" part is the interesting part. Since a real Nash Equalibrium is too difficult to calculate, it seems like their estimated Nash-Equalibirum is good enough in practice.
Any "game" with a Nash Equalibrium with randomized strategies should benefit from CFR. Which is most games of "bluffing" and hidden information (Poker, Avalon, Pokemon, Magic the Gathering), and even real-world negotiations, diplomacy, business transactions, etc. etc.
-----------
Since CFR has already proven to be superhuman at Poker, the research problem is now in extending this Nash Equalibirum estimate over to other games. IMO, Avalon / Wearewolf games are too simple, AI Researchers should move to Magic: The Gathering or other more complex games like that instead.
This is hard for humans to do, and is made interesting by the ample opportunities for intuitive leaps, but I'm not at all surprised that a program would be better at it. It's very easy in these games to incriminate yourself in ways that might go unnoticed by human players but would be incredibly obvious to an agent that's keeping a perfect record of everything that's happened.