Using AI to design board games

MichaelGagnon · on Jan 4, 2012

From the article: "Should the mantle of 'creator' lie with the program or the programmer?"

Strangely enough, this was the exact dilemma that Clarissa faced in in Episode 29 of Clarissa Explains it All, "Poetic Justice," in which she developed a program that generated poetry. When her program's poetry won the school's contest, she decided to publicly relinquish her award to her computer. http://en.wikipedia.org/wiki/List_of_Clarissa_Explains_It_Al...

kiba · on Jan 4, 2012

What an obscure reference to use. That show aired from 1991 to 1994, when I was a mere toddler.

zeteo · on Jan 4, 2012

Wait ten more years, you'll also be able to use obscure references.

MichaelGagnon · on Jan 4, 2012

As far as pop-culture goes, I think Clarissa still holds the cake for female programmer characters. Acid burn (from Hackers) is a close second...

alanfalcon · on Jan 5, 2012

My winner is Veronica Mars's Mac whose scheme to get a new car was downright evil genius (scalable too, but she chose to limit herself). Of course, Burn was right, RISC changed everything, though not yet in desktop PCs.

Blocks8 · on Jan 4, 2012

Learn something new everyday -- Clarissa was a programmer! Loved that show!

radarsat1 · on Jan 4, 2012

Interesting application of genetic algorithms. The weird thing about GA though, and any optimiser, is that you need an "error function" that you will us to determine when one solution is better than another. Presumably he designed an error function for "fun", and his GA system finds local minima within it. But this is what I mean by "weird thing": you have to come up with a quantification of what makes a game fun, and _this_ is ultimately what will determine what kinds of solutions you come up with. Regardless of how the optimiser works, the error function is what the programmer is designing, and therefore the programmer is designing the space in which solutions will exist, so I would argue that any specific games are attributed to the programmer (or whoever designed the error function), and any specific game just happens to be one choice within the available search space.

Unless you want to manually try every iteration and rate it from 1 to 10, this implies that it's possible to come up with a "funness" model of human game playing. That in itself is sort of a interesting thing, falling somewhere between philosophy and psychology, rather than computer science.

takeoutweight · on Jan 5, 2012

The problem of assigning a "fun value" to a game is an active area of research. Julian Togelius and Georgios Yannakakis are some of the big names looking at this topic right now.

One approach is to use tried-and-true statistical techniques to build models from user feedback. Often a player will be provided two variations of a game and asked which one was more fun. This input, combined with various other factors (how did the player perform? how many enemies were there in each version?) help construct some sort of classifier to automatically evaluate new game variations.

An alternate approach, which I explored during my Master's research on the topic, was to analyze existing levels in commercially released videogames to construct a model of fun from that, as opposed to considering subjective human feedback. I chose to analyse Super Mario Brothers. It turns out you can see a very characteristic pattern in how the difficult portions of the levels are arranged. This pattern of difficulty takes the form of a rhythmic interplay of difficult and easier portions, and can be nicely described in terms of Flow and the Yerkes-Dodson law. We used this a criterion for evaluating automatically generated levels as a part of a larger system.

The applications of this are pretty wide. Perhaps we could automatically evaluate web page interactions according to a certain model of fun to engage users? Perhaps we can detect when people are not having fun and automatically generate alternate designs?

rndmcnlly0 · on Jan 5, 2012

Here's my restatement of the statistical approach (correct me if I'm wrong):

Instead of proposing some metric for fun that is valid a-priori, we are going to look for the very grounded game-specific correlates of fun and optimize with respect to those. Satisfactorily capturing the entire fuzzy sense of human fun in a finite piece of code is silly, but cataloging the observed feedback for every popular combination of level elements (e.g. "goomba two blocks to the right of a coin block while the player has a fire flower") is both practical and useful for automating level design.

esrauch · on Jan 5, 2012

The article didn't really go into detail about the criteria, but there are some easily quantifiable things that are desirable in a game if it is intended to be played 'seriously' (like Backgammon or Go, unlike the games "We didn't playtest this at all" or Munchkin which are just trying to be fun instead of legitimately competitive. A minimal first player advantage, turnabout (ability for someone to come from behind), balance between skill and randomness in games with luck.

I think you could come up with metrics for interactivity, eg how much of it is 'goldfishing' vs choices that are notably different based on what the other player has done or will do. Dominion is an example where I think you could show that (for many configurations) an AI that ignores opponents actions would still win some significant portion of the time compared to an AI that takes it into account.

rndmcnlly0 · on Jan 5, 2012

It's natural to question how someone can unproblematically pose a metric on the value of game rulesets (particularly without a generous helping of philosophy and psychology to back it up). However, pragmatically, it's not a silly thing to try. If there were game-design IDEs in the future, we'd like them to provide the equivalent of spell/grammar check -- if there was an alternate design with an edit distance of two away from the current design that scored dramatically better on some canned metric, it might be worth putting a human eye on that alternative design.

If you think of the error function as the combined output of a bunch of "obvious flaw" detectors instead of a "theory of fun", then speculative optimization where the initial conditions come from a human's design under consideration becomes a potentially interesting bit of design automation. Think of design rule checks in CAD with a bit of fuzziness and a default bias -- instead of saying just yes/no, it can say "X might be a better alternative according to the metrics you've enabled in the preference window, consider adopting some of its edits".

Nvn · on Jan 5, 2012

In the comments the author of the article refers to a paper[1] that describes the system (Ludi) that they're using. They came up with 57 aesthetics criteria which they used on games played by humans and compared the scores on the criteria to the human judgements.

Based on this they picked the 16 best criteria and generated games. The fitness function is a score based on the criteria after n amount of automated playthroughs. These generated games tested by humans and they found that the aesthetics criteria and human judgements correlated significantly.

Besides these criteria (they do not describe all criteria in depth but I assume some might be leaning towards being subjective in regards to the programmers preferences) they also add a restriction to how long the planning of a move for a game can take (15 seconds) and discarding the game if it surpasses this. Which as you say can be seen as designing the search space, but the results indicate that they've come up with a way to effectively measure the "fun" factor of the games they came up with.

[1] http://www.cameronius.com/cv/publications/ciaig-browne-maire...

radarsat1 · on Jan 5, 2012

That is awesome. Thanks for pointing it out. Science!

eru · on Jan 4, 2012

Given enough money, you can outsource the error function to mechanical turk. Or to go more meta (and possibly cheaper): Make designing a good error function part of a meta GA-run, and let those functions be judged by comparing their judgements with human judgements.

rndmcnlly0 · on Jan 5, 2012

Given the effort the play a game and the inter-subject noise in a numerical judgement, I think a better use of Mechanical Turk would be to ask the player to do some blame assignment. That is, instead of rating a game as 4-stars, they could perhaps give a thumbs up/down to particular rules or sets of rules in abstracted representation of the game.

This kind of feedback extracts several more bits of information from the player than a single rating (making better use of them). However, it breaks the applicability of an evolutionary algorithm which treats both artifacts and fitness evaluation as black boxes. If you use a search algorithm that was aware of how the game was built from components, I'm guessing that component-level feedback (being both more objective and more specific) could provide more informative pressure to drive the algorithm than the standard interactive genetic algorithm setup gets.

eru · on Jan 5, 2012

Yes, I also thought of giving multi-dimensional feedback, instead of just a single number, when writing the comment.

ced · on Jan 4, 2012

Why not go full Turk? Have a Turker suggest a new rule/modification. It will be paid for only if accepted by another, who must play N recorded moves of the game with a friend before passing judgment. Iterate.

Disclaimer: I've never used MT. Is this possible?

jacques_chester · on Jan 5, 2012

A project I briefly explored was to develop an algorithmic model of "agency" in a roleplaying game, with the idea of taking existing plotline generation approaches and using the agency algorithm as a heuristic path-picker through the possible plotline alternatives.

_delirium · on Jan 4, 2012

This paper, "Automatic Design of Balanced Board Games" [2007] might be of interest for similar reasons: http://www.aaai.org/Papers/AIIDE/2007/AIIDE07-005.pdf

wtracy · on Jan 4, 2012

Thanks!

I've been kicking around the idea of using evolutionary algorithms to test gameplay balance, and it's cool to see that someone else is already working on it.

rndmcnlly0 · on Jan 5, 2012

Before you jump to evolutionary algorithms, note "The Problem with Evolution" section in the OP. This was written by a guy who knows what he's doing with evolution, and it's interesting to see him turning to MCTS given his previous success with evolution.

The gist is that games are fragile with respect to mutations (motivating the need for a search paradigm not based on gradualism). This fragility may not exhibit itself as strongly for the parameter tweaking involved in the balancing problem, but balancing quickly shades into general game design when you start to consider non-trivial tweaks.

wtracy · on Jan 5, 2012

Thanks for pointing that out.

The project I was thinking of was going to be a tower defense game, and I wasn't planning on trying to develop any rules automatically. Rather, I was thinking in terms of using an evolutionary system to pick weapon and enemy strengths that lead to balanced gameplay.

So, I would probably get away with it, but I'm wide open to different ideas.

unoti · on Jan 4, 2012

A section of the article says this: Raf describes two requirements for serendipity to occur: 1. Active searching: The designer should not simply wait for inspiration to strike, but should immerse himself in ideas and look for harmonies between them...

It reminds me of one of my favorite quotes from Picasso:

“That inspiration comes, does not depend on me. The only thing I can do is make sure it catches me working.” ― Pablo Picasso

xianshou · on Jan 5, 2012

The principle of subset contradiction (as in Yavalath, where you win by making four in a row but lose if you make three first) is extremely interesting, and it also seems like it can be applied in a fairly simple, atomic way to various types of rulesets. I wonder if a Monte Carlo ruleset search system could integrate higher-level rule change "principles" such as subset contradiction, and apply them as operations to existing rulesets...

cpeterso · on Jan 5, 2012

Not unrelated is Kevan Davis' Ludemetic Game Generator (2003), which randomly combines categories and mechanics from those at BoardGameGeek to create new (and largely useless) game ideas, with arbitrarily appropriate titles:

http://kevan.org/ludeme

Some of the random games sound more fun than others. ;)

  Game: "Indkub"
  Categories: Industry / Manufacturing, Comic Book.
  Mechanics: Set Collection, Hand Management.

  Game: "Ugplay"
  Categories: Prehistoric, Trains.
  Mechanics: Acting, Trading.

shawnc · on Jan 4, 2012

Great thoughts and very well explained also - and neat to see BoardGameGeek get linked on HN - been a BGGer for many years.

dangd · on Jan 5, 2012

A friend of mine implemented the GA-generated game Yavalath for Android. You can find it on Android Market at https://market.android.com/details?id=boardgamer.yavalath.

Yavalath has surprising depth, but it is no BattleField 3.

It is nice to see BoardGameGeek listed on HN.

mjwalshe · on Jan 4, 2012

Hmm i recall Trillion Credit Squadron back in the 80s where the use of computers was almost required to play and being at the bleeding edge of Ai research would be useful

Jtsummers · on Jan 4, 2012

Thanks for providing the name, I've been trying to remember that story for a while now but my google-fu was weak and I couldn't recall any name involved (person, program or game).

I think you're refering to Dr. Douglas Lenat's Eurisko program. He applied it to the game Trillion Credit Squadron and won two years in a row with fleets the other participants scoffed at. The first year he/Eurisko won by creating a largely stationary fleet that could take enough damage to survive long enough to destroy its opponent. It exploited a damage rule in the competition. The damaged party got to select which component/subsystem was damaged. Eurisko constructed a fleet with many small, useless components which could be individually destroyed without impacting the effectiveness of the crafts themselves.

Edit: Reading more about it apparently the primary advantage was the shear number of craft. Essentially a fleet of kamikaze craft that could overwhelm the enemy.

http://en.wikipedia.org/wiki/Douglas_Lenat

http://en.wikipedia.org/wiki/Eurisko

rndmcnlly0 · on Jan 5, 2012

I'm in grad school for AI right now as a result of reading about Eurisko as a kid. Though I haven't followed up on the greater vision, my thesis proposal a few years ago pitched the project of building a Eurisko-style discovery system for game design: http://users.soe.ucsc.edu/~amsmith/proposal/amsmith-proposal...

Also, the fact that I had a good experience playing an "academic" class character in Traveller during college was also an influence in deciding to go to grad school. I haven't seen this class in any other RPG.

mjwalshe · on Jan 4, 2012

Yes that was what I was thinking of i was always struck by the fact the book (in 1980) had the line "a personal computer would be useful"

Ironically Elite was a rip off of the basic starship combat rules in Traveller and i did try and write a game on our PDP 11/03 / VT55 using Traveler as a base.