Hacker News new | past | comments | ask | show | jobs | submit login
The Mathematics of 2048: Optimal Play with Markov Decision Processes (jdlm.info)
312 points by jonbaer on April 9, 2018 | hide | past | favorite | 43 comments



As someone who studies reinforcement learning (or at least attempts to), this article was expertly written and illustrated. The graphics were phenomenal to explain how MDP's function. The text was written with enough intuitive explanations to guide the reader but provides the correct notation to the reader to further investigate. If I had any complaint, it would be that the author didn't mention backwards induction or dynamic programming when describing the solution method in Appendix B. Still, I think this was the best intro/case study of standalone MDP's I have ever read. Great work!


I haven't read the article yet, but I'm definitely going to; what a great review! This part is the kicker:

> The text was written with enough intuitive explanations to guide the reader but provides the correct notation to the reader to further investigate.

So many explanations about complex or in-depth topics have difficulty finding the sweet spot between being too basic and hand-wavy on one end of the spectrum, and too obtuse and difficult to follow for a layperson on the other end. I've particularly noticed this with articles on quantum mechanics and relativity in the past. It's either complex mathematical equations or trains and flashlights (I'm exaggerating a bit).

When I read an article on such a topic that is accurate, yet understandable, while giving the reader the information and keywords they need in order to learn more, and providing a good on-ramp into the topic, it's a noticeable experience.


abstruse not obtuse /nit


Ugh, I always forget; I don't have a lot of occasions to use it, so it's hard to break the habit. I guess I'm just obtuse.

Thanks for pointing it out.


(Author here.) Thanks for the kind words! I'm glad to hear that. I hope this article will do some good by introducing more people to MDPs --- they are incredibly useful.

That is a good point about mentioning backwards induction in Appendix B, because that's exactly what I used! I'll add a note about it.


I feel compelled to write every time 2048 comes up: 2048 is a shameless rip-off of another game called "Threes" which came out just before it. "Threes" is more or less a game-design masterpiece, far superior in every way (graphics, sound, design, gameplay, you name it) to 2048. The reason 2048 is more popular is that it was a free clone of Threes!, and people refuse to pay even a little bit of money on mobile platforms for games.

This is nothing against this article, of course, which is an excellent write-up. But: if you like 2048, you'll like Threes better, and the creators of Threes deserves to be rewarded for coming up with the design.


For folks who are interested in the what went into designing it, the devs published their emails over the course of development. It's a pretty neat glimpse into how a game's design evolves, as well as the effort it takes to come up with a clean, "simple" mechanic: http://asherv.com/threes/threemails/


It's interesting he says that only 6 people in the world have ever seen a 6144 tile. Automatic controllers can reach it pretty often, https://arxiv.org/pdf/1606.07374.pdf says 7.83% of the tries.


Maybe inspired or stole the core concept, but 2048 is, to me, much more fun than Threes (I've logged a couple hours on Threes and many dozens of hours on 2048; I paid for Threes and kinda hate the 2048 app I use but it works - it's not the "free" quality of 2048 that keeps me playing it).

It's fair to say 2048 is a shameless rip-off, but that doesn't mean it's not better in ways that matter.


The "fanboyism" around these two games is crazy. 2048 might have been a knockoff, but it's a very different and unique game. I actually really dislike Threes because it's a "play until you get screwed by the RNG" type game. I gave up on Threes after about an hour (I didn't "refuse to even pay a little") and went back to 2048.

In 2048 you can keep playing far longer and are much more in control. Sometimes you get a number in a bad spot, but some strategy and additional luck can get you out of many of them. The sliding mechanism is also behaves differently which significantly changes the gameplay.


It's pretty unfair, in my opinion, to gate your acceptance of 2048's theft on which game you like more.


Why is it "theft" with these two games though?

Mario "stole" from Snokie which "stole" from Quest for Tires which "stole" from Jump Bug which "stole" from Wanted: Monty Mole which "stole" from Donkey Kong which "stole" from Space Panic.

The entire history of video games is taking an existing concept and adding small tweaks to create something new. Threes/2048 is no different but for some reason and incredible amount of emotion and vitriol is tied up in this one.


I'm not sure how Mario "stole" from Donkey Kong if both games were made by Nintendo...


I wouldn't characterize "stole from" as a transitive relation.


The games are no where near identical which means it's not theft.


If I remember correctly Threes is free now as well.


You're right, both are free and if somebody hasn't tried either yet, I would say download both.

I had played 2048 first so when I tried Threes, it didn't feel right to me at all. If I had tried Threes first, I probably would have had the opposite opinion.

Some of the rules are slightly different (I don't remember how), but the two games are pretty similar. Other than rule differences, the sounds in Threes were very grating to me. I ended up muting all of the sound effects and music. One huge plus to 2048 (IMHO) is that it was released as open source.


It's not a "shameless rip-off".


"Created by Gabriele Cirulli. Based on 1024 by Veewo Studio and conceptually similar to Threes by Asher Vollmer."


Conceptually similar is a not theft. It's a significant improvement on a very simple game.

It's like comparing Chess and Checkers. Similar sure 8x8, move one piece at a time, promote on other side of board etc, but they play as very different games.


As the author points out in the conclusion, the state space blows up very quickly as the grid becomes larger.

There is a large class of algorithms for finding approximately optimal solutions to MDPs[1] that are model-free or stateless, meaning you don't need to enumerate all of the state-to-state transitions to get a good policy.

If you google 2048 reinforcement learning[0], you'll find lots of implementations of such algorithms.

[0] https://www.google.com/search?q=2048+reinforcement+learning

[1] https://en.wikipedia.org/wiki/Markov_decision_process#Algori...


What a coincidence. I was just reading a blog about concrete vs abstract interpretations of chess (and how to deal with its massive state space through abstract representation) this morning: http://www.msreverseengineering.com/blog/2018/2/26/concrete-...


Shameless self plug: if you're interested in Markov Decision Processes and related algorithms , I am the author and maintainer of a C++ and Python library that implements lots of planning and reinforcement learning methods for them.

I hope to officially publish it later this year, but has already been used by a lot of people, in academia and out.

Feel free to play with it if you like!

https://github.com/Svalorzen/AI-Toolbox


(Author here.) That looks like a great library --- I will have to try it out!

In case it helps anyone, I also wrote a much smaller MDP library in ruby to support this 2048 project (and several others): https://github.com/jdleesmiller/finite_mdp . It basically just does value iteration and policy iteration.


Thank you too for putting the code for your article out there. There is always a shortage of public code to study this field, and every bit helps.

I've actually wanted to write a similar article on 2048 for a while, but unfortunately I've never found the time. But it seems now I won't have to, your article is great and much better than anything I could have ever done!


Sorry guys, but I won the 3x3 to 1024 on the first attempt, so the next 99 of you will only see losses. ;)

> We can also see it being ‘lazy’ — even when it has two high value tiles lined up to merge, it will continue merging lower value tiles. Particularly within the tight constraints of the 3x3 board, it makes sense that it will take the opportunity to increase the sum of its tiles at no risk of (immediately) losing — if it gets stuck merging smaller tiles, it can always merge the larger ones, which opens up the board.

One flaw in the strategy I noticed was failure to prioritize getting higher values together when board space is scarce. That is, it seems to employ the same strategy regardless of board free space. (Maybe they addressed this, admittedly didn't read all.) Or maybe that is better than my strategy.


Boardspace creates uncertainty - on a wide open board there are more places the new tiles can appear. It seems possible that controlling the amount (or, more precisely, the position) of boardspace helps reduce the odds of new tiles appearing in unhelpful places, so prematurely combining tiles could be an antistrategy.


Are you sure? Maybe it delays moving large tiles together until the last moment before it risks being impossible to combine then.


It's better than your strategy.


Care to elaborate?


That’s what optimal means.


Strictly speaking, all we can say is that the optimal strategy used in my post is no worse than ballenf's. It's possible that ballenf's strategy is also optimal --- i.e. that they are equally good. :)


Yes! Optimal strategies for playing a game too complicated to reason out completely in your head, or which has some random element often aren't unique.

Cepheus http://poker.srv.ualberta.ca/ is a Heads Up Limit Texas Hold 'Em strategy (two players, no betting strategy needed beyond whether to fold / call / raise) that is proved to be approximately optimal. There absolutely must be other strategies that are similar, and in particular circumstances one would dominate another, it's just that since they're all optimal if they played random hands over time they'd all break even.


Separate but equal. There can be different strategies that are exactly equal in terms of some performance measures.


If anyone is interested in the "state of the art" of solving 2048, I believe https://arxiv.org/pdf/1604.05085.pdf is currently the best, reaching the 32 768-tile in 70% of games.

The code on github: https://github.com/aszczepanski/2048


Back in the heydays of 2048, I wrote a solver based on monte-carlo search[0]: Given a board state, continue playing in memory using random-moves. Do this multiple times. Choose the best fairing initial move.

It does surprisingly well considering it has no domain-specific knowledge of the game. (Reaching the 8192 tile)

Also this might be of interest: Many variants of 2048 came out at the time such as a hexagonal board or Fibonacci tiles. The algorithm is domain independent so it works just as well for these variants. Also, luckily, the original version of the game had its game state and controls exposed in JS and therefore so did most of the variants. This allowed me to write a bookmarklet version of the solver that could play many of the variants without modifications.[1]

[0]https://stackoverflow.com/questions/22342854/what-is-the-opt... [1]https://ronzil.github.io/2048AI-AllClones/


This was a nice article. I've enjoyed running this 2048-ai[0] every now and then. It's fun to turn on the browser extension and watch it go. It's come quite a long way from the earlier versions.

[0]https://github.com/nneonneo/2048-ai


"it takes at least 938.8 moves on average to win"

60% of the time it works every time.


To reach 2048 the game works by adding the numbers 2 or 4 at each play with probabilities 0.9 and 0.1, respectively. Then the expected average optimal number of steps to win is 2048/(.1 * 4 + .9 * 2) ~= 930.9. This optimal number of steps occur when 2048 is reached and is alone in the board. In the real case, the larger the board, the more scattered are going to be the numbers, and the number of step will increase accordingly. This result is valid for every board size.


This task seems particular well suited for Markov models because it actually is a Markov process underneath. Also, I don't know why, but I was particularly tickled by the sentence:

> At the risk of anthropomorphizing a large table of states and actions, which is what a policy is, I see here elements of strategies that I use when I play 2048 on the 4x4 board 6


For all the in-depth analysis of MDPs, a very good heuristic for playing 2048 is to just alternate swipes toward a corner (e.g. down and to the right), with an occasional movement in the other direction if it's stuck. How close does this get to the optimal policy?


Took me a while to realize playing "optimal" means in this case playing optimal according to their chosen reward scheme. Not optimal in the sense of optimal strategy to play 2048 at all which the title can suggest.

Anyway, nice solution and presentation.


"Conclusion

We’ve seen how to represent the game of 2048 as a Markov Decision Process and obtained provably optimal policies for the smaller games on the 2x2 and 3x3 boards and a partial game on the 4x4 board.

The methods used here require us to enumerate all of the states in the model in order to solve it. Using efficient strategies for enumerating the states and efficient strategies for solving the model makes this feasible for models with up to 40 billion states, which was the number for the 4x4 game to 64. The calculations for that model took roughly one week on an OVH HG-120 instance with 32 cores at 3.1GHz and 120GB RAM. The next-largest 4x4 game, played up to the 128 tile, is likely to contain many times that number of states and would require many times the computing power. Calculating a provably optimal policy for the full game to the 2048 tile will likely require different methods.

It is common to find that MDPs are too large to solve in practice, so there are a range of proven techniques for finding approximate solutions. These typically involve storing the value function and/or policy approximately, for example by training a (possibly deep) neural network. They can also be trained on simulation data, rather than requiring enumeration of the full state space, using reinforcement learning methods. The availability of provably optimal policies for smaller games may make 2048 a useful test bed for such methods — that would be an interesting future research topic."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: