This really makes me want to go back to my game and play some more with AI. Although, at the same time I realize that's the mistake I made - making the game/platform for creating AI. It really takes a toll. This guy uses existing NES games, which allows him to focus on the AI and even generic game learning. Brilliant.
You can clearly see the advantages of AI in the short-term play (think in the order of milliseconds, exact frame-by-frame button presses) over anything a human could ever achieve.
Imagine combining the short-term button-mashing of said AI with the long-term planning of human players...
Feed your bot AI with information that is 200 ms old and force it to extrapolate 200 ms in the future for its aiming/movement/etc. logic. It will make the bot much more human-like with noticeable reaction time, so it could be tricked by sudden changes in your behaviour. Such a simple but powerful AI trick.
Some of the tool-assisted speedruns feed scripts into the emulator. One of the pokemon yellow arbitrary code execution runs used some clojure: http://tasvideos.org/3767S.html
Both the video and paper are well worth going through!
The Abstract is 16 words. The introduction begins with
"The Nintendo Entertainment System is probably the
best video game console, citation not needed." Need we say more?
Also, I wish non-programming fields of media would also adopt pythonic triple-quotation marks.
They aren't "inch marks". They're straight double quote marks. The prime (used for feet, arc minutes), double prime (used for inches, arc seconds) and ditto mark are all distinct.
The paper is worth a full read as well. One of my favorite bits... "The FCEUX emulator is about a jillion lines of C++-ish code, was intended as an interactive GUI application, contains support for multiple di erent platforms, and the code is, on a scale from a pile of horse shit to not horse shit, approximately a 2"
As one of the major contributors to FCEUX, I feel that the author is misrepresenting the code quality of FCEUX. To be fair, I'd say the code would rate about 1.5 or maybe 1 on said scale.
That's just normal in Super Mario Brothers 1, kills by shell start at 500 points and escalate for more targets. All the later games start at 100 instead, SMB3 and SMW and the New SMB series.
The video is long but worth all 16 minutes. I like how he isolated such a dead simple heuristic for success in video games. It's not surprising, in retrospect, that "numbers going up" would correlate with winning, but to prove it in practice is pretty neat.
He isn't very explicit about it, but the "time travel" that is mentioned basically allows it to go back in time if things turn out badly. This is what allows it to make such unbelievable trick moves: in short timespans it is almost a brute-force search, but over the longer term it can't backtrack as much.
Sounds to me like the next big improvement would be to give the algorithm a concept of boredom: If the numbers haven't gone up in a while, consider that very bad. Try progressively more crazy things.
For example, unpausing Tetris. Or having Mario run left.
Not sure if people notice the last sentence in the article:
"Murphy ran a few other games through it, including Tetris, and found that the program would eventually just pause itself rather than continue playing and lose, a tactic shared by annoying, over-competitive cousins around the world since 1985."
War Games: Very, very basically, from deep, deep memory...
A computer about to launch a global nuclear war is played at tic-tac-toe to learn it cant win. It deduces that the only way to win a nuclear war is not to start the thing in the first place. Any other strategy is a lose, or what we call mutually assured destruction.
Its obviously more complicated than that. Even though its 30 odd years old, it worth a watch.
Just to note that the AI has complete access to the NES's RAM (and also backtracks whenever it gets into trouble), which is not quite fair to compare to other approaches to AI or playing against a human. I'd be pretty damned good at battleship if I could see your board and rewind whenever I missed.
yes, but the point is also that the program doesn't really know the goal, it just tries to increase bytes that look like counters. that's the fun thing.
you should compare it with a game whose rules you don't know, being able only to understand your score. like trying to play spider solitaire on windows without knowing how the game works (it allows you to undo your moves).
i agree that it is an interesting result, however having access to everything is weird from an AI point of view and really muddies any conclusions to be drawn; the headline seems to make it sound like this is a breakthrough in AI, when it is using a cheat that won't apply to any sort of real situation.
notice that in your example of solitaire, knowing the values of hidden cards really changes the nature of the game and makes playing the game much less strategic
Not sure why TechCrunch made it seem that way ("a breakthrough in AI"), when the author explicitly says he's submitted this to SIGBOVIK 2013 -- an April 1 conference that usually publishes fake research. (http://sigbovik.org/2013/)
This is from the paper. "I tried again, and it was a huge breakthrough: Mario jumped up to get all the coins, and then almost immediately jumped back down to continue the level! On his first try he beat 1-2 and then immediately jumped in
the pit at the beginning of 1-3 just as I was starting to
feel suuuuuper smart (Figure 7). On a scale of OMFG
to WTFLOL I was like whaaaaaaat?"
The interesting part wasn't the algorithm that played the game, but another algorithm that learned what the goals and subgoals of the game were just from watching a single playthrough. The AI that played the game was just to prove it worked.
But the AI isn't making strategic decisions. It's making brain-dead ones. It analyses a play-through looking to find sections of RAM that increase as time increases. It them locks in on these sections of RAM and says "increasing the values in these sections of RAM is how I determine success." It them brute-forces a play-through with the ability to rewind time (to try a different set of inputs).
This is why he mentions that Karate Kid didn't work well, because one of the factors of a successful play-through was that your opponents health is decreasing, which is something that this AI doesn't look for.
The ability to rewind time wouldn't make for a very interesting play-through of Battleship, but depending on how game state is stored, it might not even be able to accurately deduce a good game state from a bad one.
This is why he mentions that Karate Kid didn't work well, because one of the factors of a successful play-through was that your opponents health is decreasing, which is something that this AI doesn't look for.
I don't believe that was the case. From the video he mentioned it using the power kicks up all on the first enemy because they produced the most favorable outcome for that game, at the expense of later games when more powerful moves are needed (since the AI can't plan that far ahead).
From the paper, written by the same person who made the video:
"The result is not impressive at all; the main goal here is to reduce the opponent's health, but our objective function can
only track bytes that go up."
I can't watch the video at the moment but I imagine the paper goes much farther in-depth with the internals of this AI.
Yeah, I'm wondering how it might work if instead of using a sequence of byte configurations in memory it used a sequence of pixel configurations from the screen.
Video is hilarious and worth watching the whole thing as others have mentioned, but there's some great stuff in the paper also. He mentions some really interesting future directions, e.g. obviating the need for the training data set of a human playing the game by having the computer initially try different sets of random inputs and continuously improving on those that do the "best" according to it's objective function. The bit about the computer trying to play "dinosaur coloring" is fantastic also.
At university we had a contest with programmable tanks. Many wrote elaborate algorithms and tried to find the optimal strategy, but the winner was very crude: It tried to get a little bit more life than its opponent, and if it was ahead it would immediately start spawning threads, until the play server couldn't handle it anymore and crashed -- Winning by default.
I had a similar kind of assignment, using a java system called robocode. I ended up writing a system that could track a tank moving in a static arc, while moving itself, which worked surprisingly well.
Ha! This shows up in pop culture in the game Star Control II (awesome game btw):
spoiler alert | You are travelling thru space encountering an ever-increasing number of what appear to be robotic ships with whom you can briefly interact before they end communication and attack you. Long story short: a planet-bound race, the Slylandro, purchased a robotic ship for exploration and to establish friendly contact with other races. The ship had the power to self-replicate; it also carried a small defensive weapon and a powerful lightning-bolt type tool for breaking-down and gathering resources for ship replication. The Slylandro wanted more probes faster so they turned the replication priority up to 11, overriding all other priorities. When a ship meets you, it attempts to initiate contact but is quickly forced into resource-gathering/replication mode & proceeds with breaking apart your ship into raw components. It is, essentially, a paper-clip maximizer. :)
Bruteforcing all states of NES game could result in finding vulnerability in NES emulator and then to fill memory of emulator with FFs via changing settings it could make mess from accessible file system or, some life spans of universe later, find vulnerabilities in OS, hardware or connected network.
It sounds to me like it's basically how deterministic a game is that determines whether this works. For a Tetris with a defined order that the pieces appear in, I bet this algorithm could do pretty well.
For the purposes of this AI player, all NES games are fully deterministic. Even what hardware the NES had that could be used for random number generation is under the control of the emulator and being run deterministically. There's no "random" in the usual sense here.
It seems like the problem is more that it can't look far enough into the future to see that grabbing lots of points now (by stacking blocks) is canceled out by losing (that is, not gaining) points later.
It would be interesting to see how it responded when the 'Pause' function was disabled. It seems like it would need to track 'past future' state to find a solution.
Edit: Or maybe try to filter out 'constant' sources of increasing value. Treat the stacking points as noise.
Pardon me if I'm really ignorant of the AI field here, but doesn't this look like the beginning of more general artificial intelligence? The organic feel of some of these moves is almost scary to me.
What exactly does "time travel" mean in this context? That the simulation makes some moves, sees what happened, and then has the ability to undo its moves and go back to its old state, and then try new moves?
When a baseball player swings a bat, in anticipation of the ball arriving just in time to get knocked out of the park, do you consider that time travel?
Is the baseball player's anticipation of the future really all that different from what this program is doing?
I think the difference is that the computer can try the input and record the result and then try again. The baseball player can't have a redo of that pitch if they miss or if it's a foul.
I don't really see how it's time travel but that's what he called it.
the program fails repeatedly and then loads a previous state to try a different move, what you see is the result of multiple trials. it can't reliably get it right on the first try.
Neither can a baseball player. He's been practicing since age 3 to hit that ball.
Is a decade or two of baseball practice really all that different from a computer playing and re-playing the game over and over looking for optimal strategy?
Genetic algorithms (and variants) can pick up some very interesting patterns for achieving the target fitness. I don't think this is necessarily anything new, and GAs have their limitations, but applying them to multiple NES games is impressive. I'm convinced that GAs will be an important part of future computing.
The scary thing is artificial intelligence isn't about incredibly complex systems, like you would assume. It uses some rather simple algorithms, which are often not unlike (or modelled) on things in real life. We are no where near close to a proper intelligence, we just have a bunch of algorithms that 'seem' intelligent. AI is mostly about pattern matching.
Related: genetic algorithms to play video games. I can't find the exact one I read about before, which was for Super Mario Bros., but searching for "mario genetic algorithm" leads to a number of papers, videos, &c.
I love the fact that a newspaper can still publish this without fear of losing viewers. It's like eating at a restaurant that everyone qualifies for dog food but... well...we go anyway :)
Note that it was published as part of SIGBOVIK. This paper definitely falls within the approaches to computer science exemplified by the illustrious, eponymous Harry Q. Bovik.
You will be amused to learn that the author, Tom 7, also wrote the Crap Art Manifesto at http://crapart.spacebar.org/, which is equally funny and even more thought-provoking.
The goal isn't to beat mario as efficiently as possible, but to create a program that can learn to play literally any game.
Also what is that AI doing when it decides to jump into the pit at 0:45? It clearly sees the path to jump over it, but it chooses the one that jumps into it. Watching it frame by frame it seems to figure out a path to jump out of the pit several times, but chooses not to, until eventually it finally does.
Uninformed speculative question here: could the algorithm be improved with some sort of momentum term in the input selection?
It seems to succeed with move-to-the-right (position counter) but fall into what looks like Brownian motion on Pac-Man. It seems to me that putting momentum into its strategy (and changing when the objective function declined) might reduce the implicit branching tree.
What it simulates, then, is the tendency for people to fall into "a groove" and depart from it when their objective function ceases increasing.
I'd really like to see a fuller analysis of what Pac-Man was doing there. As I understand the algorithm, it should have seen into the future where the dot gets eaten, and not turned back.
But maybe the program failed to discover the bytes which hold the score, and so it didn't think eating dots was a particular notable activity.
I imagine for the bots you mentioned, that they are programmed with specific logic for those games. As for the NES emulator capable of playing the second player, I imagine that would be a similar story.
The research here is not a computer program that plays games. Rather, it's a computer program that (attempts to) play any game, without having any understanding of what it means to play that game, purely by watching the buttons that a human player would press while playing that game.
4 directions and two buttons. You could point a crappy machine learning algo at it, and it would figure it out. That is far fewer variables to manipulate than what a Quake bot has to deal with.
I would bet money I could zap a pigeon every time it died in mario and it would learn to be a master in 12 hours. It won't master CoD anytime soon.
How would it figure out what it means to be "winning"? It might be possible, but would require using OCR to read the 'score' (and guessing as what is meant by the score) and would also need some way to determine if the goal is to "go to the right", etc. (also possible). Doing this would be interesting and more practical in that it could work with games where you do not have access to the memory, but seems like it would require more work in interpreting how the game's state has changed and wouldn't allow going back in time.
A Quake bot is easier to develop because it is written for a specific game.
I accept that bet for training a pigeon to play SMB :), although the scenario you described isn't comparable (it is only one game and you would be giving the pigeon extra feedback about how to play it. It still seems hard and I think it would be notable if you achieved it.)
There are Guitar Hero Bots taht "OCR" (your words) the screen to play the guitar. Not such a hard trick to read a score or see a turtle. (you know we have self driving cars right?)
I think trying to understand what is going on in a game by looking at the screen would be more challenging than trying to understand by looking at the game's memory if you want to write a bot that can play any generic game.
The examples you gave are each specific to particular domains.
This whole topic reminds me of how it is much easier to develop natural language processing for use in a particular domain compared to generically.
I write "generic" Natural language processing for a living.
Also "nintendo" is a particular domain. When you only have a "language" of 8 directions, A, and B, and only time to deal with in response to a screen that side scrolls in only one direction the programming is easy.
A chess engine has more potential decisions for a bishop than this has for a given move. And unlike the the NES, the chess engine has to adjust for changes in the behavior. Given the same input the NES would make the same choices. You can macro through most the games.
1. This was a paper submitted for a joke conference. It is in no way a giant leap in AI. It is something that some nerds find amusing. Like an anti-Rube Goldberg machine. It is trying to accomplish something quite complex using a ridiculously stupid approach.
2. Its stupid because it is looking at the memory state as just a list of numbers that change during the gameplay. It has no knowledge of the actual game. It doesn't parse the memory to figure out game state or anything. It doesn't even have any generic idea of what games involve. It figures out "oo, the number that starts at location 243 goes up in the training data. I'll mash buttons to try to make memory location 243 go up".
You're taking this way too seriously. Think of it as someone tinkering with a specific algorithm who happens to overlay it ontop of NES games to show that it's working.
It's a cool project. If this is nothing compared to the things you've done, you're welcome to post them- we'd love to see them!
That won't work, the pigeon will stop playing. But a pigeon will recognize paintings by a certain painter, or steer a missile to a target. Relative to the state of the art in machine learning pigeons are fairly sophisticated, so it means little to claim that a pigeon might learn to do something.
The CoD / Quake bots were programmed to play those games.
The emulator was presumably programmed to play those 40 games, too.
But the real measure of intelligence is to throw something at the AI that it's never seen before, and see how it performs. And that's the magic of this one: it's a general-purpose video game playing AI. You can find a new game that it wasn't specifically designed to play, and it will quite often do a decent job playing it.