Hacker News new | past | comments | ask | show | jobs | submit login
A Better Strategy for Hangman (datagenetics.com)
215 points by icefox on March 29, 2012 | hide | past | favorite | 62 comments



An even better strategy would be to weight your guesses towards the most information gain. If a letter appears often, but is in the same position most of the time, getting it won't help you as much as getting a letter that appears in fewer words but in more varied positions. Doing a quick check of 5 letter words in my /usr/share/dict/words file I find: S: 659,62,267,260,2282 E: 122,829,448,1254,679 A: 252,1201,678,410,307

That is, S, when it appears, is heavily skewed towards the last letter in the word, finding it won't help you figure out the word nearly as well as finding the more evenly distributed E or A.


Doesn't this all assume that the opponent is choosing a random word? If they're trying to be evil, I'm sure they can pick words where each letter you get gives little information.

For example, which letter do you guess if the puzzle is _og? You still have to go through bcdfhjln even after you've burned however many guesses getting that o & g.


I once wrong a hangman game that did this in reverse. The computer made sure to make you be as wrong as possible, while still ensuring your guesses were "correct".

It had one problem in that it always tried to make sure you guessed wrong, but by carefully choosing your letters you could cause it to exclude tons of words because they contained the letter you guessed, leaving only a small number of possible words.

It needed a refinement to occasionally allow you to guess correctly if that maximized the number of possible words remaining that still fit the prior guesses.


This is a good insight. To formalize it: each question segments the words into 2^max_word_length groups of various sizes, and we seek to minimize the worst case number of questions still needed, regardless of which answer we get back.

Which, for the ispell words list, appears to be 'e' :( edit: which was the solution found for the suboptimal method in the article, and which is not present for words in the corpus 25% of the time

code at http://pastie.org/3693750


Not quite; you don't want to minimize the worst case, but the expected case. That comes from maximizing the entropy of the result set ( https://en.wikipedia.org/wiki/Entropy_(information_theory) ).

To calculate this, take the sum of -p * log(p) for all patterns of the guessed letter, where p is the proportion of words in the candidate set that have this pattern.


> Not quite; you don't want to minimize the worst case, but the expected case.

That depends on your opponent. In game theory you often assume that your opponent plays against you and as good as they can, so you'd go with the worst case.


That's only true if the worst case still lets you win the game. Otherwise, your opponent will pick the worst case word every time and you lose. As a zero-sum game, if the players agree on a dictionary beforehand, there is some Nash Equilibrium; I don't have the skill to figure out what it would be, but there would have to be some nonzero probability of guessing every word in the dictionary before running out of guesses.


Interesting. Thanks for your comment here and the one above. Although I firmly believe that hangman is a game, and hence the assumptions of game theory are in full effect, I had not considered the information theory take on things :)

Since you asked about the Nash equilibrium for this case, here it is:

In the event that you did not have enough questions to uniquely identify the possible words (and assuming that words with more letters than guesses are not allowed ie every word can be guessed with the right questions), the game becomes a matching game [http://en.wikipedia.org/wiki/Matching_pennies].

First both players will work out the minimum number of question paths that can identify every allowed word. Any word in more than one set will be ignored as a dominated strategy by the hosting player. Then the guessing player will pick randomly between question paths, and the hosting player will pick randomly among the disjoint sets of words at the end of those question paths, with the specific word not mattering.

You can see that once both players are picking between several options, with the entire game riding on their choice, the actual letter guessing becomes unimportant. The game itself resembles an unfair rock paper scissors.

If the players were betting $1 per round, the expected value to the guesser is ($2 divided by the number of disjoint sets) - $1.


How does this work if there are multiple covering sets with the same (minimal) number of question paths? Does it matter which minimal covering set you evaluate, or does your strategy need to be some kind of superposition derived from all the minimal sets?

Also, if there are some words that are never picked by your strategy, doesn't that open the door for an alternative set of question paths that only covers the now-reduced dictionary? (cue reference to the Unexpected Hanging Paradox)

In many ways, the entropy is a heuristic for finding a good question path given the known probabilities for each word in the dictionary. The formula that I quoted above can be trivially extended for nonuniform distributions of the chosen words. It sounds like the Nash equilibrium would be designed such that (after eliminating dominated strategies), the expected information gain at each step would be the same for all question paths. I wouldn't want to have to prove it, though.


I buy that information gain is the same as the optimal strategy, so long as you are taking into account your opponent's optimal strategy with a weighted probability distribution.

The unexpected hanging paradox is a great link, thanks :) awesomely enough though, game theory handles this problem! This is the "why" behind minimax. If John Nash were sentenced by that judge, Nash would decide what day to expect by flipping a coin, and the judge would segfault :p

To answer your question though, if there are multiple paths to the same words (eg Tree(1) leads to Union(a,b), Tree(2) leads to Union(b,c), and Tree(3) leads to Union(a,c)), that surprisingly doesn't make the solution any harder... though my links above aren't particularly useful for it.

To find the optimal strategy for the player choosing the word: for every possible set of words (so all 2^words-1 options), calculate what the optimal strategy would be for your opponent if they were playing against someone picking randomly from that group, and then choose the group causing the lowest expected value for your opponent.

Make sense? [edit: NO! I didn't show that optimal would necessarily come from a uniform random selection from the word bag... and a proof doesn't jump to mind, though my intuition tells me so]

[edit 2: YES! Just assume that the strategy doesn't need to be uniform [though I still maintain it will be], and use gradient descent to find the optimal weighting]

You seem like the sort of person who would LOVE reading about the computational roshambo competition run out of UBC a while back (specifically the iocaine powder overview at http://webdocs.cs.ualberta.ca/~darse/rsb-results1.html). You might also like http://wiki.mafiascum.net/index.php?title=WIFOM

Also, if you haven't been corrupted by LessWrong yet I highly recommend it :p


That's not even close to giving a practical algorithm because (1) the set of words is huge, the powerset of words is huger (2) you didn't describe how to find the optimal counter strategy. And actually I don't think it's necessary to consider any subset of words.

The usual way to find a Nash equilibrium is with linear programming. But for that you have to construct a matrix of size #(deterministic strategies for player 1) * #(deterministic strategies for player 2). Player 1 does not have so many stategies (just the number of words in the dictionary) but player 2 has an enormous number of strategies.

So something else is needed. We can try to write an algorithm for finding expected value of the the optimal counter strategy to when player 1 is playing a mixed strategy, let's call it ev(s) where `s` is a game state. Then we would have ev(s) = max_{s' in successors of s} ev(s'). Perhaps this can be done using dynamic programming, but it could lead to far too many states being evaluated. If you could give an approximated upper bound then perhaps some of the paths could be discarded early. Anyway then the Nash equilibrium is to minimize this ev over all mixed stategies of player 1, i.e. all probability distributions over the words. How one would do that is another tricky question.

One positive news is that we can consider each word length separately, as the final Nash equilibrium would always choose a word of the same length for player 1, i.e. the optimal strategy for hangman would always choose a word of size n (but we don't know n yet, and n will depend on the dictionary).


Sorry about that. I got too caught up in showing that there was in fact an optimal strategy, and there were no hangman paradoxes. In response to (2), that was done in the GGP -- you are selecting the question that for all words gives you the least distance from the answer (and if that doesn't cover every answer, then the minimal number of questions to do so). I posted ruby code in the GGP for a heuristic version, the non heuristic version would look similar.

What follows is my solution to this game after having a night to sleep on this. It's O(26^26 times words) [because it requires you construct the optimal move for the guesser], so perhaps only of interest to myself.

Problem restatement: Imagine instead that we are playing a game where, for doors 1 through m, we must pick a door and hide some prize behind that door. Our opponent must then choose from a list of moves that open ranges of doors.

To solve that problem, it is relatively easy. First form the set of transitive closures on moves, such that any move that shares a door with another move is in the same closure. Next, for each closure, mark every door by the number of moves that can reach it, and progressively remove the highest marked number until doing so would then leave no doors. If at any point this process led to those moves no longer sharing a closure, reform the closures, add them to the list, and process them again in this manner.

Observe that for any closure remaining, for every pair of doors (a,b) reachable by the same moves, we can discard the second without changing the optimal strategy (because any move the guesser made would be the same to us for both a and b).

Now observe that for any closure remaining, if the dictionary only included words reachable from that closure, picking with a uniform randomness among the doors not removed by counting or pairing would be optimal (as the same number of moves can reach any remaining door). To prove that moves with higher marks would only decrease our EV, simply observe that if we did include any door with more marks, then those moves could be favored by the guesser to increase the guessers chance of winning.

To complete the solution, we work out the the expected value of playing each closure independently, and select between them by the ratio of their inverse EV with the whole. (So if we had two closures, one with EV -1 [if we could only choose from words behind this door we would always loose], and the other with -4/5 [all doors can can be reached by 4 out of the 5 moves in this closure], we would pick between the two with a ratio of 4:5).

For a proof that the above mixing is optimal, see the analysis for the game "Bluff".

Anyone who understood all that, lives in San Fransisco, Santa Barbara, or San Diego, and would like to meet up to discuss economics, game theory, algorithms, machine learning, computer science, or world domination over coffee is cordially invited to email me at dls@lithp.org (yes, that's "lisp with a lisp", perhaps puns should have been on that topic list as well :p)


That would be trading risk for gain.

Remember, if you pick a correct letter you will not loose any guesses, in other words you get another guess if your guess is right. It's not about finding the word in a fixed number of guesses.

But otherwise you could also extend your method to increase the weight of letters appearing more than once in a word since that gives even more information gain.


I won't call out the company because at least until now they were still using the question (I got accidentally cc'ed on a subsequent candidates email), but last summer I got an interview homework question from a "who is hiring" post that was to solve hangman for a given dictionary. I implemented all of the strategies outlined here, and while they seemed impressed with the execution speed of my solution, they implied it could have scored better, saying it was only "average" in that regard.

The thing is, at least for the dictionaries I had available, trying to weight for information gain only considering the guess correctness wasn't all that great a strategy. Even if you know 100% certain that the letter will be correct, its very likely to still cut the search space down by virtue of the pattern of letters, and in aggregate my strategy made no statistical difference. (+/- 1% depending on random seed).

If I were going to try and improve my scores, I guess the next thing would be look at the distribution of letter-patterns for each remaining letter, but my intuition is that it just wouldn't be a big deal.


I loved this blog post (and the other one I read, about Battelship), so I wondered who was behind these great posts.

Their homepage says this "DataGenetics is a technology consultancy specializing in unlocking the value stored in large databases. Using a variety of techniques we can mine the trends in your data to help you maximize your marketing and advertising campaigns."

Damn.

Yet another group of smart folks focused on more-effective advertising.


That's where the money seems to be. It's a shame, but I can't blame groups like these.


Why not? Do you think people have no responsibility for their choices other than to maximize their financial gain?


No. I don't believe that a person's intelligence level makes it appropriate for any other party to decide what she should and should not do. I also question what "responsibility" an intelligent person has in their career choices. Responsibility to whom? Society? Should they be out curing cancer instead? I cannot accept the notion that people in the aforementioned position have a responsibility to anybody but themselves and their families.


I never said anything about intelligence level or other parties deciding what people should do.

Are you able to consider the idea that someone's responsibility to themselves and their families might lead them to make decisions other than maximizing financial gain?

Also - I'm curious as to why you think family members are a special case?


I think you may be reading my previous comment without considering its context in the thread. I was initially writing in response to the observation that this was "Yet another group of smart folks focused on more-effective advertising." It wasn't clear to me that you and I pivoted from this.

Of course I'm able to consider the idea, rbarooah. What I reject is the notion that we can project our own values on a person and then blame him for not fulfilling them (thus my original comment).


I do recall the context.

What were you trying to convey with the phrase "It's a shame?" Was that not your judgement of their choice?

Also, how do you propose to prevent other people from blaming one another for things?

You didn't answer my question about why family members are exempt from your general view - which I am genuinely curious about.


It's an interesting article, but I would have thought that the best* letter to guess is the one that appears in as close to 50% of the possible words as possible, rather than the most likely to appear, given that the hangman is drawn per error, rather than per guess. Is my reasoning wrong?

* assuming as he does at the bottom the existence of a computer.


Your intuition is telling you that you should shoot for 50% so that you can divide the search space in half regardless of the outcome. A right or wrong answer will eliminate half of the possible remaining words.

But right and wrong guesses are not equally valuable in hangman. A wrong guess will eliminate all words of a given length containing the given letter. But a correct guess gives you more: you get the location (or locations) of the letter in the word as a bonus. This will reduce your remaining search space drastically.

Thus, you should optimize for the success of your next guess.


Additionally, you're not penalized for correct guesses.


It depends whether you care about minimising the number of errors, or just wish to keep it below 11 - thereby minimising your chance of losing. These are subtly different and depend on your "reward function".


If you want to minimize the number of errors, you should try the letter with the most information. In other words, minimize the search space. That is what hythloday suggested.


I do not think that is correct. Let's say that, at some time, you know the word has an E, with 99% probability at position 1 and with 1% at position 2. You also know there is 50% chance that the word has an O. I would guess the E, because it comes free, even though it is not the one that gives the most information.

You must somehow weigh information gain against risk.


Assuming there is always a letter with a 50% chance and we choose wrong 11 times in a row. One more and game over. At this point we have 11 Bits of information, which is enough to distuingish between 2048 words.

Actually, we should be correct 50% of the time. Which means 22 Bits of information or 4194304 words.

Additionally, we know the length of the word.

The english dictionaries seem to have between 400k and 1000k words [0] of all word sizes. With 22 Bits we get 4000k words. We do not have to worry about getting hanged using the information-reduction algorithm. ;)

[0] http://hypertextbook.com/facts/2001/JohnnyLing.shtml


I have to correct myself. You do not want a 50% chance, because the answer does not provide 1Bit of information. If we get it right, we also know the position(s) of the letter.

More precisely: Given a dictionary of all possible words and a letter to guess, we can split the dictionary into k parts. One sub-dictionary contains all words for which the answer is negative. Additionally, we have k-1 sub-dictionaries for each equivalence class of letter positions.

For each letter, we can compute the partition. Now we need to choose the strategically best partition. I believe it should be the one with "the lowest average sub-dictionary size".


The simplest way I know of to do that is by a weighted linear combination. Both information gain and risk are quantifiable; one in bits and the other in probability (or log-odds which is also bits). Then the problem reduces to finding appropriate weights, which can be done with your favorite n-dimensional search algorithm.

It may also be useful to consider how many remaining incorrect guesses there are in the game.


If you guess a letter contained in 99% of remaining possible words, then either

- you are correct (happens 99% of the time). No penalty, and you find out at which positions your guessed letter occurs, a decent information gain.

- you are incorrect. You suffer a penalty but eliminate 99% of the remaining words, a massive information gain.

How would 50% be preferable in either case?


No, we should still go for the letter with the highest frequency.

If we get a hit, it's not just the presence, or not, of the letter, we learn both the number of hits, and the positions of these hits. These are incredibly valuable pieces of information (more valuable that dividing the remaining words into two equal piles).

If we miss, we've carved away and removed the biggest non-possible set of words with one guess.


I kept waiting for the other foot to drop, and it never did.

This is great if you're playing hangman against a computer. Probably, you're not.

You're playing against a person, who probably has some sense of your strategies.

Even when I was a kid, we didn't play hangman by choosing random words. What's the fun in that? You notice how the other person plays, and in the next rounds you pick words that break their strategies.

You notice they are doing the "common letters" or even "common vowels" strategy, and you give them "lynx".

Or you trick them with a word that has a few easy-to-get letters, but is going to be hard to guess the last few because there are so many possible matching words -- I had a great one that I forget now... maybe "budder"?

Did everyone really play with randomly-chosen words? What a waste.


Clearly we should have a hangman challenge similar to the RoShamBo challenges people used to run to explore possibly strategies in a multi turn game.

http://webdocs.cs.ualberta.ca/~darse/rsb-results1.html


Great comment - in hangman, choosing letters solely on the basis of data (no matter how complete the data or how compelling the study) is a highly predictable (and thus highly defeatable) strategy.

Of course, this analysis has a counterpoint. If I try to pick words strategically, I suspect that my words would begin to follow patterns...


To be sure. I recall we'd mostly stick with the two strategies I mentioned (words with most rare letters, or words in very common patterns)... then when you're settling into a pattern (and your opponent's first guesses are "x" or "q" or similar), throw in something like "teat" to throw them for a loop.

We did have one minor rule change -- definitely more than 11 line segments to hang the man. If you had a really good word, and wanted to draw out the torture, you'd start drawing in facial features, fingers...

It also adds a bit of artistic fun to the game; you don't even have to be hanging a "man" if you want to get creative; it could be a giant insect, horse, whatever.

It would still be pretty clear who was "winning" by how long the string of crossed-out letters were by each round.


You can repeat a similar analysis like they did against an opponent who chooses carefully and knows your strategy. With the theory of two person zero-sum games you can even prove that there's a perfect strategy both for the guesser and the chooser each that doesn't get worse if the other party knows it. (Hint: the strategies will involve random selection with carefully weighted probabilities.)


My favorite strategy: four letter word. Say no to everything until they guess U: _ U _ _ then decide on the least likely word which includes none of their guesses, often "junk".


That isn't a strategy, that is cheating. ;)


I cheat the same way by selecting "_OO_". Guessers get frustrated because, intuitively, it doesn't look like there are 30+ english words matching this pattern: root, fool, loom, noon, etc :-)


If you play that way in Dutch, you can use three letters and even give away that the word ends in 'ak'. That leaves at least BDGHJLMPRTVWYZ for the first letter (A and K give valid words, too, but will already have been guessed)

In general, shorter words are harder to guess in hangman.


This is great, but it's just a first-order strategy! Once your opponent knows you're following these (quite rational) rules, she could pick words where the rules fail particularly badly.

I look forward to seeing your analysis of how to deal with that :)


The strategy given isn't to solve the puzzle, but to get at least one letter on the board. So if you look at the table at the end, the worst case is a three-letter word that only contains a K out of the letters given, which is the worst case scenario on that entire board. Assuming your opponent sticks to the dictionary there is no way for them to ruin this strategy, only push you into the worst cases given, which once you figure out what's going on will play to the guesser's advantage, not the word-chooser.


I was hoping for a little list at the end saying "By the way, if you're playing against someone who's using this strategy, the best word to play is: xyzzy" (or whatever word it actually would be)


This came up as an pre-interview employment challenge that I did a few months ago. I've uploaded my code here: https://gist.github.com/2242816.

I could make good arguments for both "most likely to be present" and "most information gain", so I implemented a mixed strategy. I calculated the information gain (in bits) and the probability of a correct guess, then used a linear combination of them to determine which letter to guess. Additionally, the weights changed over the course of the game so that correct guesses were valued more as incorrect guesses became more scarce.

To determine appropriate weights, I ran a crude Monte Carlo simulation. The final weights I ended up with were 0.60:0.03 with no incorrect guesses and 0.54:0.69 once there aren't any strikes left. (bits information gain : p(correct guess))


> There are no two letter words containing the letter C, Q, V or Z.

Does 'qi' count? http://www.merriam-webster.com/dictionary/qi

I know it counts when playing Words with Friends...

I also think 'za' is valid when playing Scrabble type games, but 'za' technically isn't a word.


qi and za are recent additions to the Scrabble dictionary, but in any reasonable friendly word game they wouldn't count because they're not in common usage. IMHO they also significantly changed the gameplay of Scrabble for the worse by making it much easier to get rid of the z and q tiles.


IIRC "zo" is also valid. It's a Tibetan yak hybrid.

Hmmm... Yak butter tea.


In case you cared, "zo" is valid in the UK/international Scrabble wordlist, but not in the US (yet, until we persuade them to use the same wordlist as everyone else).


Good comments. Being paranoid, I double checked the database I used! Neither 'qi' nor 'za' are in the database I used.

I'm not arguing that they are, or are not, words. I'm just saying that they were not in the dictionaty file I used :)


Cool, but assumes all words have an even probability.

Vs a week player the ideal word list should only include 'common' words.

Vs an ideal player you need to assume they will pick from the list of words that you least likely to guess using optimum play.

Generically, optimum play ends up with a somewhat random list, aka 90% of the time pick E, 10% of the time pick I etc.

PS: Actually generating this list is a 'hard' problem and vary dictionary dependent, but you can probably get reasonably close using some sort of genetic algorithm and a enough simulation time.


> PS: Actually generating this list is a 'hard' problem and very dictionary dependent, but you can probably get reasonably close using some sort of genetic algorithm and a enough simulation time.

Sounds like a challenge. I don't think, given a dictionary, finding the ideal strategy for both parties will be that challenging. It's a fairly straightforward two person zero-sum game. You can either model it assuming hidden information, or concurrent play. (Which is basically the same here.)

As an interesting variation, you might allow the chooser to cheat: I.e. don't make them write down the word in the first place, just require their play to be consistent.


Two possible improvements, that I think may be original (thought I probably missed it in the article and comments)

1) A hit or miss of a letter tells you a lot about what the new subsequent optimal guess is. You could make a flow chart (a really big one) that shows you the optimal letter to call out next based on what has hit and missed.

2) the position that a letter has hit tells you a lot about the target word. if you have a computer, the regex becomes trivial to identify the next optimal letter to guess. an 'optimal' table could be generated based off this pattern, and it would be a huge table.

I kept feeling like every step of the way it was an infomercial, 'but wait... there's more!' and I like that, it got me thinking.

I'd love to have my hangman bot go head to head with yours on random words. that would be a fun little project.


Very cool. I was implementing Hangman a few weeks back and I figured out the first strategy, use the frequency distribution for the specific length to initiate your guessing. However, I thought about but discarded the second strategy of conditioning on the previous wrong/right guesses and selecting words and then doing a histogram of the letters. Thinking back, I find it interesting that thinking about it from a complexity point of view made me dread that approach because it becomes significantly more expensive to do the recomputation of frequency distributions compared to blindly uses the naive independence assumption and I was not sure how much of a win I would get by adding that bit of optimization.


Instead of a table at the end, what if you were to boil it down to a single list of letters that I can easily memorize? You could do this by weighting the final table with the number of 1-, 2-, 3-, etc -letter words in the dictionary.

Put another way, what are the letters I should guess when I don't know the length of the word?

While it wouldn't be perfect strategy, it would be easier to execute.


There are lots of good further caveats in this thread.

We should have a hangman AI tournament.


Does this mean that RSTLNE on wheel of fortune is non-optimal too?


If you aren't taking into account the length of the word, then it looks like the optimal choice of 5 consonants and 1 vowel. Look at his section titled "First refinement" the most common letters were (in order) "ESIARNTOL" Eliminate all but the first vowel and you get ESRNTL, the same as WoF.


Given your perception can generally fill in the vowels - picking any early on would seem non-optimal.


I think it's more a goal of eliminating uncertainty about which letter occupies a slot.

For example, if I have guess T and E and see _ _ E as the word I know it's not the word "The".


This seems like a problem ripe for solving with a markov model.


Playing hangman against Sheldon Cooper was a mistake in the first place :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: