Hacker News new | past | comments | ask | show | jobs | submit login
Programmer Creates An AI To (Not Quite) Beat NES Games (techcrunch.com)
399 points by npongratz on April 14, 2013 | hide | past | favorite | 120 comments



FYI the video is worth a full watch. The author of the paper is painfully funny.


I can confirm it was well worth my 16 mins.

This really makes me want to go back to my game and play some more with AI. Although, at the same time I realize that's the mistake I made - making the game/platform for creating AI. It really takes a toll. This guy uses existing NES games, which allows him to focus on the AI and even generic game learning. Brilliant.

You can clearly see the advantages of AI in the short-term play (think in the order of milliseconds, exact frame-by-frame button presses) over anything a human could ever achieve.

Imagine combining the short-term button-mashing of said AI with the long-term planning of human players...


You're looking for the pleasingly-symmetric-to-AI acronym IA, for Intelligence Amplification: http://en.wikipedia.org/wiki/Intelligence_amplification


On game AI, here's a neat tidbit.

Feed your bot AI with information that is 200 ms old and force it to extrapolate 200 ms in the future for its aiming/movement/etc. logic. It will make the bot much more human-like with noticeable reaction time, so it could be tricked by sudden changes in your behaviour. Such a simple but powerful AI trick.


Some of the tool-assisted speedruns feed scripts into the emulator. One of the pokemon yellow arbitrary code execution runs used some clojure: http://tasvideos.org/3767S.html


Crikey, just read the paper:

""" On a scale from "the title starts with Toward" to "Donald Knuth has finally finished the 8th volume on the subject," this work is a 3. """


Both the video and paper are well worth going through!

The Abstract is 16 words. The introduction begins with "The Nintendo Entertainment System is probably the best video game console, citation not needed." Need we say more?

Also, I wish non-programming fields of media would also adopt pythonic triple-quotation marks.


Eww, why would you want triple quotation marks being adopted anywhere?

What overlying function in print would they serve exactly?


Many people forget that using inch marks in place of quotation marks (which don't have neating issues) is an ASCII-only phenomenon.


They aren't "inch marks". They're straight double quote marks. The prime (used for feet, arc minutes), double prime (used for inches, arc seconds) and ditto mark are all distinct.


The paper is worth a full read as well. One of my favorite bits... "The FCEUX emulator is about a jillion lines of C++-ish code, was intended as an interactive GUI application, contains support for multiple di erent platforms, and the code is, on a scale from a pile of horse shit to not horse shit, approximately a 2"


As one of the major contributors to FCEUX, I feel that the author is misrepresenting the code quality of FCEUX. To be fair, I'd say the code would rate about 1.5 or maybe 1 on said scale.


Around 09:20 in the video he gets 500 points from kicking a shell into a koopa into another koopa. How does that 500 point score happen?


That's just normal in Super Mario Brothers 1, kills by shell start at 500 points and escalate for more targets. All the later games start at 100 instead, SMB3 and SMW and the New SMB series.


Very true. The end was hilarious.


Much better article here: http://www.wired.co.uk/news/archive/2013-04/12/super-mario-s...

The video is long but worth all 16 minutes. I like how he isolated such a dead simple heuristic for success in video games. It's not surprising, in retrospect, that "numbers going up" would correlate with winning, but to prove it in practice is pretty neat.


He isn't very explicit about it, but the "time travel" that is mentioned basically allows it to go back in time if things turn out badly. This is what allows it to make such unbelievable trick moves: in short timespans it is almost a brute-force search, but over the longer term it can't backtrack as much.

If that makes sense.


Brute force search with time in one dimension is equivalent to this "time travel", isn't it?


Yes, that is essentially what is happening.


Sounds to me like the next big improvement would be to give the algorithm a concept of boredom: If the numbers haven't gone up in a while, consider that very bad. Try progressively more crazy things.

For example, unpausing Tetris. Or having Mario run left.


> Try progressively more crazy things.

Teaching a computer various concepts of 'crazy' must be tons of fun! :-D


Pausing Tetris to avoid losing was hilarious.


Not sure if people notice the last sentence in the article:

"Murphy ran a few other games through it, including Tetris, and found that the program would eventually just pause itself rather than continue playing and lose, a tactic shared by annoying, over-competitive cousins around the world since 1985."


"The only way to win is not to play"


"The only winning move is not to play."

The author is humorous, poetic, and brilliant.


and possibly a fan of WarGames :)


Exactly what I thought. I even think it was a clever way to end, and a point he might have been making as a sort of subtext.


I've never seen WarGames. After reading the synopsis, the subtext is undeniably intentional.


30 year old movie spoiler alert!!!!

War Games: Very, very basically, from deep, deep memory...

A computer about to launch a global nuclear war is played at tic-tac-toe to learn it cant win. It deduces that the only way to win a nuclear war is not to start the thing in the first place. Any other strategy is a lose, or what we call mutually assured destruction.

Its obviously more complicated than that. Even though its 30 odd years old, it worth a watch.



Nice :)

But no.


If you look at the other work Tom7 has done over time, you'd find thats generally the case with his stuff.


Shit. I just lost.


The "you can't fire me, I quit" approach. I like it.



Just to note that the AI has complete access to the NES's RAM (and also backtracks whenever it gets into trouble), which is not quite fair to compare to other approaches to AI or playing against a human. I'd be pretty damned good at battleship if I could see your board and rewind whenever I missed.


yes, but the point is also that the program doesn't really know the goal, it just tries to increase bytes that look like counters. that's the fun thing.

you should compare it with a game whose rules you don't know, being able only to understand your score. like trying to play spider solitaire on windows without knowing how the game works (it allows you to undo your moves).


i agree that it is an interesting result, however having access to everything is weird from an AI point of view and really muddies any conclusions to be drawn; the headline seems to make it sound like this is a breakthrough in AI, when it is using a cheat that won't apply to any sort of real situation.

notice that in your example of solitaire, knowing the values of hidden cards really changes the nature of the game and makes playing the game much less strategic


Not sure why TechCrunch made it seem that way ("a breakthrough in AI"), when the author explicitly says he's submitted this to SIGBOVIK 2013 -- an April 1 conference that usually publishes fake research. (http://sigbovik.org/2013/)

It is clearly an April Fool's hack.


Good, I'm not the only person who suspected that.

This is from the paper. "I tried again, and it was a huge breakthrough: Mario jumped up to get all the coins, and then almost immediately jumped back down to continue the level! On his fi rst try he beat 1-2 and then immediately jumped in the pit at the beginning of 1-3 just as I was starting to feel suuuuuper smart (Figure 7). On a scale of OMFG to WTFLOL I was like whaaaaaaat?"


The interesting part wasn't the algorithm that played the game, but another algorithm that learned what the goals and subgoals of the game were just from watching a single playthrough. The AI that played the game was just to prove it worked.


And that's probably why he didn't showcase playfun on Battleship.

I do agree with you though.


But the AI isn't making strategic decisions. It's making brain-dead ones. It analyses a play-through looking to find sections of RAM that increase as time increases. It them locks in on these sections of RAM and says "increasing the values in these sections of RAM is how I determine success." It them brute-forces a play-through with the ability to rewind time (to try a different set of inputs).

This is why he mentions that Karate Kid didn't work well, because one of the factors of a successful play-through was that your opponents health is decreasing, which is something that this AI doesn't look for.

The ability to rewind time wouldn't make for a very interesting play-through of Battleship, but depending on how game state is stored, it might not even be able to accurately deduce a good game state from a bad one.


This is why he mentions that Karate Kid didn't work well, because one of the factors of a successful play-through was that your opponents health is decreasing, which is something that this AI doesn't look for.

I don't believe that was the case. From the video he mentioned it using the power kicks up all on the first enemy because they produced the most favorable outcome for that game, at the expense of later games when more powerful moves are needed (since the AI can't plan that far ahead).


From the paper, written by the same person who made the video: "The result is not impressive at all; the main goal here is to reduce the opponent's health, but our objective function can only track bytes that go up."

I can't watch the video at the moment but I imagine the paper goes much farther in-depth with the internals of this AI.


Yeah, I'm wondering how it might work if instead of using a sequence of byte configurations in memory it used a sequence of pixel configurations from the screen.


Video is hilarious and worth watching the whole thing as others have mentioned, but there's some great stuff in the paper also. He mentions some really interesting future directions, e.g. obviating the need for the training data set of a human playing the game by having the computer initially try different sets of random inputs and continuously improving on those that do the "best" according to it's objective function. The bit about the computer trying to play "dinosaur coloring" is fantastic also.


The day has come, "AI rage quits, passes Turing test."


My favorite part was how the program finds glitches and uses them ruthlessly.


At university we had a contest with programmable tanks. Many wrote elaborate algorithms and tried to find the optimal strategy, but the winner was very crude: It tried to get a little bit more life than its opponent, and if it was ahead it would immediately start spawning threads, until the play server couldn't handle it anymore and crashed -- Winning by default.


I had a similar kind of assignment, using a java system called robocode. I ended up writing a system that could track a tank moving in a static arc, while moving itself, which worked surprisingly well.


I wonder if a more advanced version could be used as a bug finding tool for modern games.


I love that idea! What seem to us to be implausible, to the code it looks like nothing other than a legitimate strategy.

The issue for modern games is the amount of memory used.


Then they taught it to make paperclips, and ended up with one of these:

http://wiki.lesswrong.com/wiki/Paperclip_maximizer


Ha! This shows up in pop culture in the game Star Control II (awesome game btw):

spoiler alert | You are travelling thru space encountering an ever-increasing number of what appear to be robotic ships with whom you can briefly interact before they end communication and attack you. Long story short: a planet-bound race, the Slylandro, purchased a robotic ship for exploration and to establish friendly contact with other races. The ship had the power to self-replicate; it also carried a small defensive weapon and a powerful lightning-bolt type tool for breaking-down and gathering resources for ship replication. The Slylandro wanted more probes faster so they turned the replication priority up to 11, overriding all other priorities. When a ship meets you, it attempts to initiate contact but is quickly forced into resource-gathering/replication mode & proceeds with breaking apart your ship into raw components. It is, essentially, a paper-clip maximizer. :)

http://wiki.uqm.stack.nl/Probe


No need. In order to maximize score, two NES's are better than one.. and three are better than two..


Until it finds a glitch in the matrix that allows it to pause real life.


Bruteforcing all states of NES game could result in finding vulnerability in NES emulator and then to fill memory of emulator with FFs via changing settings it could make mess from accessible file system or, some life spans of universe later, find vulnerabilities in OS, hardware or connected network.


"This game is mostly a go to the right and jump and throw hammers at birds..." Classic delivery. I love this guy.


He made me laugh repeatedly.


If your numbers are going up, it means you are having more fun!

http://radgeek.com/gt/2006/04/28/antieconometrix_comix/

It's too bad that game that need more lookahead - Tetris, etc, are not amenable to this approach.


It sounds to me like it's basically how deterministic a game is that determines whether this works. For a Tetris with a defined order that the pieces appear in, I bet this algorithm could do pretty well.


For the purposes of this AI player, all NES games are fully deterministic. Even what hardware the NES had that could be used for random number generation is under the control of the emulator and being run deterministically. There's no "random" in the usual sense here.


It seems like the problem is more that it can't look far enough into the future to see that grabbing lots of points now (by stacking blocks) is canceled out by losing (that is, not gaining) points later.

It would be interesting to see how it responded when the 'Pause' function was disabled. It seems like it would need to track 'past future' state to find a solution.

Edit: Or maybe try to filter out 'constant' sources of increasing value. Treat the stacking points as noise.


Pardon me if I'm really ignorant of the AI field here, but doesn't this look like the beginning of more general artificial intelligence? The organic feel of some of these moves is almost scary to me.


Keep in mind that it's using time travel to do those moves. In the real world time travel is not currently available.


What exactly does "time travel" mean in this context? That the simulation makes some moves, sees what happened, and then has the ability to undo its moves and go back to its old state, and then try new moves?


Yes. It's basically a brute-force Groundhog Day approach to finding the best move.


Another movie analogy would be Next. Cris Johnson can see two minutes into the future and redo anything that has a bad outcome.

http://en.wikipedia.org/wiki/Next_(2007_film)


Yes, exactly.


When a baseball player swings a bat, in anticipation of the ball arriving just in time to get knocked out of the park, do you consider that time travel?

Is the baseball player's anticipation of the future really all that different from what this program is doing?


Yes, it's incredibly different! A huge problem in AI is internally modelling the outside world. This completely sidesteps that.


Maybe, but that doesn't really have anything to do with the "time travel" question.


Of course it does. Models are useful because they have predictive power. If you have time travel, you don't need to predict anything.


I think the difference is that the computer can try the input and record the result and then try again. The baseball player can't have a redo of that pitch if they miss or if it's a foul.

I don't really see how it's time travel but that's what he called it.


the program fails repeatedly and then loads a previous state to try a different move, what you see is the result of multiple trials. it can't reliably get it right on the first try.


Neither can a baseball player. He's been practicing since age 3 to hit that ball.

Is a decade or two of baseball practice really all that different from a computer playing and re-playing the game over and over looking for optimal strategy?


It is if it can't learn from those mistakes.


Simulation and models often are, though.


This looks similar to the genetic algorithm technique I used for the AI Programmer project http://www.primaryobjects.com/CMS/Article149.aspx

Genetic algorithms (and variants) can pick up some very interesting patterns for achieving the target fitness. I don't think this is necessarily anything new, and GAs have their limitations, but applying them to multiple NES games is impressive. I'm convinced that GAs will be an important part of future computing.


The scary thing is artificial intelligence isn't about incredibly complex systems, like you would assume. It uses some rather simple algorithms, which are often not unlike (or modelled) on things in real life. We are no where near close to a proper intelligence, we just have a bunch of algorithms that 'seem' intelligent. AI is mostly about pattern matching.


Related: genetic algorithms to play video games. I can't find the exact one I read about before, which was for Super Mario Bros., but searching for "mario genetic algorithm" leads to a number of papers, videos, &c.




I love the fact that a newspaper can still publish this without fear of losing viewers. It's like eating at a restaurant that everyone qualifies for dog food but... well...we go anyway :)


Very funny video and worth watching.

But it's not even close to beating NES Games. I'm not a expert in AI but I'm not sure it actually achieves anything.

I'd consider it more an art project than a computer science project.


Note that it was published as part of SIGBOVIK. This paper definitely falls within the approaches to computer science exemplified by the illustrious, eponymous Harry Q. Bovik.


You must not be very fun at parties.


Depends on what sort of parties you like.


You will be amused to learn that the author, Tom 7, also wrote the Crap Art Manifesto at http://crapart.spacebar.org/, which is equally funny and even more thought-provoking.


Using A* does a much better job: http://www.youtube.com/watch?v=DlkMs4ZHHr8

Look at that AI play. It's godly.


The paper uses an approach general enough for any game on the NES, which is what makes it exceptional. The humor is killer, too.


The goal isn't to beat mario as efficiently as possible, but to create a program that can learn to play literally any game.

Also what is that AI doing when it decides to jump into the pit at 0:45? It clearly sees the path to jump over it, but it chooses the one that jumps into it. Watching it frame by frame it seems to figure out a path to jump out of the pit several times, but chooses not to, until eventually it finally does.


The ending is very poetic in a way.


More authors should make videos like. Well done!


I wonder how this would handle an NES golf game.


In "NES Open", you earned money depending on how you placed. So, given long enough jumps back in time, there is a way to track an increasing score.

Either way though, I think the results would be sub-par.


slowclap.gif


[deleted]


If I had named it golfclap maybe you would understand. Thanks for being a dick.


The only winning move... is not to play.


Uninformed speculative question here: could the algorithm be improved with some sort of momentum term in the input selection?

It seems to succeed with move-to-the-right (position counter) but fall into what looks like Brownian motion on Pac-Man. It seems to me that putting momentum into its strategy (and changing when the objective function declined) might reduce the implicit branching tree.

What it simulates, then, is the tendency for people to fall into "a groove" and depart from it when their objective function ceases increasing.


I'd really like to see a fuller analysis of what Pac-Man was doing there. As I understand the algorithm, it should have seen into the future where the dot gets eaten, and not turned back.

But maybe the program failed to discover the bytes which hold the score, and so it didn't think eating dots was a particular notable activity.


Maybe it gets distracted increasing the pac-scent? There's certainly a lot more of that around in memory


i hope i live until the day when robots can beat games without any prior knowledge of the game logic,well actually it would be quite scary


Scary indeed. Those robots might be very good at ... Thermonuclear War.


... sooo, that's not really a great application of a GA. Perhaps not even a great implementation.

Akin to making things out of Legos at best. (ie: "because!")


It's not an application of a GA at all. Before dismissing something, it's best to understand it.


Considering there are bots to beat you at Call Of Duty, and Quake, this is not such a feat.

I can't find a reference but I recall an Emulator for NES that used to be able to be player 2 for something like 40 NES games.


I imagine for the bots you mentioned, that they are programmed with specific logic for those games. As for the NES emulator capable of playing the second player, I imagine that would be a similar story.

The research here is not a computer program that plays games. Rather, it's a computer program that (attempts to) play any game, without having any understanding of what it means to play that game, purely by watching the buttons that a human player would press while playing that game.


I think you missed the point somewhat. Those bots were coded just for those games, with human-made models and strategies adapted to them.

This bot, on the other hand, understands nothing about the games themselves besides the input keys and the score number(s). It's playing blind.


Having written real AI's this aint one.

4 directions and two buttons. You could point a crappy machine learning algo at it, and it would figure it out. That is far fewer variables to manipulate than what a Quake bot has to deal with.

I would bet money I could zap a pigeon every time it died in mario and it would learn to be a master in 12 hours. It won't master CoD anytime soon.


How would it figure out what it means to be "winning"? It might be possible, but would require using OCR to read the 'score' (and guessing as what is meant by the score) and would also need some way to determine if the goal is to "go to the right", etc. (also possible). Doing this would be interesting and more practical in that it could work with games where you do not have access to the memory, but seems like it would require more work in interpreting how the game's state has changed and wouldn't allow going back in time.

A Quake bot is easier to develop because it is written for a specific game.

I accept that bet for training a pigeon to play SMB :), although the scenario you described isn't comparable (it is only one game and you would be giving the pigeon extra feedback about how to play it. It still seems hard and I think it would be notable if you achieved it.)


This AI has access to all of the ram of the game.

There are Guitar Hero Bots taht "OCR" (your words) the screen to play the guitar. Not such a hard trick to read a score or see a turtle. (you know we have self driving cars right?)


I think trying to understand what is going on in a game by looking at the screen would be more challenging than trying to understand by looking at the game's memory if you want to write a bot that can play any generic game.

The examples you gave are each specific to particular domains.

This whole topic reminds me of how it is much easier to develop natural language processing for use in a particular domain compared to generically.


I write "generic" Natural language processing for a living.

Also "nintendo" is a particular domain. When you only have a "language" of 8 directions, A, and B, and only time to deal with in response to a screen that side scrolls in only one direction the programming is easy.

A chess engine has more potential decisions for a bishop than this has for a given move. And unlike the the NES, the chess engine has to adjust for changes in the behavior. Given the same input the NES would make the same choices. You can macro through most the games.


I think 2 things need to be emphasised.

1. This was a paper submitted for a joke conference. It is in no way a giant leap in AI. It is something that some nerds find amusing. Like an anti-Rube Goldberg machine. It is trying to accomplish something quite complex using a ridiculously stupid approach.

2. Its stupid because it is looking at the memory state as just a list of numbers that change during the gameplay. It has no knowledge of the actual game. It doesn't parse the memory to figure out game state or anything. It doesn't even have any generic idea of what games involve. It figures out "oo, the number that starts at location 243 goes up in the training data. I'll mash buttons to try to make memory location 243 go up".


You're taking this way too seriously. Think of it as someone tinkering with a specific algorithm who happens to overlay it ontop of NES games to show that it's working.

It's a cool project. If this is nothing compared to the things you've done, you're welcome to post them- we'd love to see them!


That won't work, the pigeon will stop playing. But a pigeon will recognize paintings by a certain painter, or steer a missile to a target. Relative to the state of the art in machine learning pigeons are fairly sophisticated, so it means little to claim that a pigeon might learn to do something.


As the author says, the point is not the results, but the elegance of his generic approach.


The CoD / Quake bots were programmed to play those games.

The emulator was presumably programmed to play those 40 games, too.

But the real measure of intelligence is to throw something at the AI that it's never seen before, and see how it performs. And that's the magic of this one: it's a general-purpose video game playing AI. You can find a new game that it wasn't specifically designed to play, and it will quite often do a decent job playing it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: