Hacker News new | past | comments | ask | show | jobs | submit login
Playing Pokemon Red with Reinforcement Learning (github.com/pwhiddy)
169 points by GaggiX 11 months ago | hide | past | favorite | 69 comments



In the video (at about 4:30), they identify an issue with novelty search - the agent is rewarded for finding areas of the map with irrelevant animations like waves in water, swaying flowers or NPCs walking randomly, as each new frame will have lots of changed pixels. The author fixes the problem by just increasing the threshold for changed pixels needed to receive a reward. This is a good idea, but might run into issues in later parts of the game where the screen is mostly animated water.

Interestingly, this problem has been studied previously - an OpenAI paper called it the "noisy TV problem", since an agent could in principle get maximum novelty reward for just staring at a TV screen showing random noise.

The solution they proposed (https://openai.com/research/reinforcement-learning-with-pred...) called Random Network Distillation, uses a pair of networks, one learned and the other randomly initialized. The goal of the learned network is to predict the output of the randomly initialized one on the newly observed frame. If the output is easily predicted, it means the frame is similar to what's already been seen before, even if it has different TV noise or water patterns. They got some pretty good results on Atari and other games, so it'd be interesting to see if it works well on Pokemon.


> The author fixes the problem by just increasing the threshold for changed pixels needed to receive a reward. This is a good idea, but might run into issues in later parts of the game where the screen is mostly animated water.

They could've also implemented this by reading the game memory. Pokemon Red internally keeps track of which areas the player has visited before.

I think the reason they opted for the visual approach is because they didn't want to give the AI any knowledge that isn't readily accessible to the player. Or they simply wanted more granularity than what the game keeps track of.


They read game state from memory for a number of things they might've done visually (health bars). Recognizing when the player has entered a new screen seems inbounds. Humans can recognize that easily.


The problem is that when a map has animated tiles with different frequencies, the overall animation period of the map is the least common multiple of the periods of the single tiles. For instance, a map covered in tiles with 10 frames will also have 10 frames. But if the map has, say, both tiles with 10 and 11 frames, then the period of the map is 10*11=110. Basically, maps containing different animated tiles can potentially have lots of different screen states, while maps with the same tiles have a low number of screen states.

I think maps full of the same tile repeated wouldn't be a problem so the threshold might work in pretty much the whole game.


This reminds me a bit of Twitch Plays Pokemon[1]. In that setup, it was almost impossible to accurately control the character, which meant it behaved close to a random walk. In addition to hundreds of thousands of (disagreeing) people typing at once, there were dozens of seconds of latency between the inputs and the observed results. In the end, the hive mind had to figure out how to guide that randomness towards success.

[1] https://en.wikipedia.org/wiki/Twitch_Plays_Pok%C3%A9mon


the first twitch plays pokemon was wild because nobody knew if beating the game was at all possible, and if it were, nobody knew how much time it would take

note that TPP inputs resemble random inputs but aren't! The game is made easier because people could and did collaborate through the chat, and simultaneously made harder because griefers could and did attempt to sabotage the game

the only thing that made progress possible imo was that there is no game over screen, and at death you dont lose anything too important; however there was a clear "winning" ending screen (pokemon is a game where you can win but you can not lose), so the collaborative effort to win the game was at an advantage

random inputs put a bound on the expected time it takes to beat the game, but the time is astronomical. the game was beaten much much sooner than this theoretical bound, which means that collaboration eventually won out. this only happened imo due to social effects: if griefers were more determined and more numerous, perhaps the first iteration would be still ongoing

after the first, it kind of became much easier, because people already knew it was possible in principle


> pokemon is a game where you can win but you can not lose

As I recall, there are several ways you can lock yourself out of being able to complete the game (releasing unique pokemons, primarily), so there was a good deal of trepidation & danger when it came to some critical moments and when the trolls got the upper hand.


You are correct, there are ways to do so. There are also soft-locks which are states which are just incredibly hard to get out of (heat death of the universe type scenarios some times). Pikasprey yellow is one youtuber who chronicles things in that area.

But I think the commenter above meant that Pokémon was a game which was possible to complete in a twitch plays style because it doesn't result in game over like an FPS or platform we would for example.


Vaguely remembering from having played the game 25 years ago, I think you can maybe release all pokemons with cut and be locked?


It's a tad harder than that. "Common" softlocks generally put you in the elite 4 rooms, with 1 pokemon that is underleveled and knows splash, with all others released. These elite 4 rooms are generally the only rooms that you can enter but not leave (except by beating the trainer or losing), and then you create additionnal constraint to prevent a combat loss from freeing you, like purposefully not picking up a lot of optional items that would make it easier to bounce back.


HM moves like cut cannot be deleted, and if released those Pokemon immediately return if the game detects they're the last one

You can't get softlocked by releasing an HM move Pokemon.


I recall the way controls were interpreted from the chat was changed a few times to get past it being too random. At some point a voting window was made and the most common command from the last n was accepted


Anarchy vs democracy. Good times.


>

    Anarchy and Democracy

    TwitchPlaysPokemon now has two modes, anarchy and democracy.

    Anarchy mode is the "old" mode, where everyone's inputs are applied immediately.

    Democracy mode is vote-based and has a more sophisticated input system.

    In order to switch from one mode to the other, the mode that isn't active needs 75% of votes as indicated by the dotted line, the current percentage of votes is indicated by the black line.

I wonder if this kind of thing could be implemented into a Reinforcement Learning model.. Like Playing pokem via Playing Twitch plays pokemon with Reinforcement Learning.

I'm thinking of how a house fly can or .. sheep .. can suddenly just go nuts and bounce about to get unstuck from somewhere.


> I'm thinking of how a house fly can or .. sheep .. can suddenly just go nuts and bounce about to get unstuck

People do this too!


The rules were changed on the fly too. Adding a "democracy mode" where inputs are suspended and a majority rule per action, made many parts significantly easier (or perhaps even taking the difficulty from impossible to possible).


> nobody knew if beating the game was at all possible

A few people including myself knew. Eventually entropy would settle and once the initial hype was over, it was only a matter of time before the chat would finish the game. Saying that "nobody knew" seems like historical revisionism.


If it were just up to randomness with an unmodified ROM, then (barring a glitch) they would have soft locked at the Safari Zone which costs some currency to enter and needs to be traversed to the Surf HM in a relatively tight number of moves. There's only a finite amount of money in the game, so they can only fail at most a few hundred times before the game becomes impossible to beat.


> There's only a finite amount of money in the game

Technically untrue, you can use Meowth's Pay Day to generate cash.

https://bulbapedia.bulbagarden.net/wiki/Pay_Day_(move)#Gener...


Meowth was only available in Pokémon blue. Once you get to the top 4 and can beat them money is infinite, but in red you theoretically can fail at the safari zone.


As a part of it I think most people believed that it was possible. There were strategy docs and images constantly being linked constantly in chat and it was clear the incremental progress was being made.


I was there for the first few iterations of TPP and it is probably my favorite moment of the internet.

The lore created around TPP was insanely good and captivating, like ascribing god-like status to certain Pokémons and finding funny interpretations of in-game events and random Pokémon names

Bird Jesus, ATV, Lord Helix :D

https://helixpedia.fandom.com/wiki/Gen_1_(Pokemon_Red)


I love bringing up Pokemon as a humbling example of how AI progress is counterintuitive. Naively we might see games like Go, Chess, StarCraft, and Dota as more "complex," yet current approaches still tend to fail here, a game we would expect an early-grade school child to have no problem completing because of how incredibly sparse the reward function is. I hope that one day we'll get a satisfying tabula rasa solution to narrative-based/world model games like Pokemon rather than something like "well it turns out a GPT ingested a gameplay walkthrough during training and can regurgitate that as gameplay inputs womp womp."


To be fair, even those early-grade-school children come in with the ability to read the text, while game-playing AI usually does not. If a human tried to play Pokémon Red without being able to read anything (even the numbers!), they would probably succeed eventually, but it would be quite difficult and frustrating. And even without reading, the human would still be making inferences based on how the graphics and sound resemble real-world objects. So for an AI to play the game “properly”, it really shouldn’t be a complete tabula rasa; it should have some of that knowledge too.


> even those early-grade-school children come in with the ability to read the text, while game-playing AI usually does not.

You sure? :D No idea what a grade school is, but I couldn't read a word of English at the age that I played Pokémon on the gameboy!

I did need occasional hints (every couple weeks) from my older cousin to get unstuck, but afaik still loved the game, I guess just because it's about Pokémon. I think cut (meaning both pussy and fuck in Dutch, fun fact) was one of the things I got stuck at, perhaps needing to teach it to a Pokémon, but probably on how to use it. It's a number of different things to connect and so you can't button mash your way through, iirc (it has been a few decades)


Quick anecdote: a nephew of mine, who was in third grade at the time, was playing my brother-in-law’s old Gameboy and some version of Pokémon.

He wasn’t reading the text. He would then get stuck, not knowing what to do, and start over. He did this several times and complained that the game was broken.

Some time later I gave him Pokémon Sword for his Switch, as I thought it’d be good for him. Some time after that he was diagnosed with ADD (to the surprise of no one). Some time after that he proudly told me he beat it.

He also said he didn’t like it that much. Hah. I’ll still count it as a win.


Go-Explore could probably solve this, but yeah, I wouldn't describe that as a satisfying solution yet. DRL has been in hibernation because generative AI has sucked all the oxygen out of the room, but I remain optimistic about directions like Gato working.


> Naively we might see games like Go, Chess, StarCraft, and Dota as more "complex"

The strong AIs made for those games had access to some crucial human "help" though.

* AlphaStar (For StarCraft) was trained on a massive database of human games, while this project started Pokemon Red with a blank slate.

* AlphaZero and AlphaZero Chess did start blank slate, but had a hand-coded and domain-specific search algorithm (MCTS) to explore various actions (800 iterations IIRC) before actually making that action. There's no obvious way to do something similar for Pokemon Red.


The Youtube overview video included within the Github is a great watch and exploration with RL https://www.youtube.com/watch?v=DcYLT37ImBY


Another cool video about an input sequence that is almost guaranteed to win pokemon run throughs: https://www.youtube.com/watch?v=6gjsAA_5Agk no AI needed ;)


IMO, this video is probably a better source link


Stupid question: if I have a pokémon red cartridge, how am I supposed to “legally get the ROM”? (Except by downloading it anywhere since I already has a license for it so there's nothing as illegally getting the ROM)


I have experience in this area :D

Here are some relevant notes and pictures of mine about that subject

https://text.nstr.no/~erik.nordstrom/game_boy_color_forever....

https://casa.nstr.no/0001

I bought two different devices for dumping ROM and save game data, and the one that worked best for me was the insideGadgets GBxCart RW.

I even thought about offering as a service that people could ship their GameBoy carts to me and I would dump the ROM and SAV files for them, and then I would ship their cart back to them and provide the dumped data along with a video showing the process of dumping the data.


The easiest way is to use a cartridge ROM dumper such as: Retrode, Flash Boy, GBxCart RW, RetroBlaster 2.0, etc.

There are also a lot of DYI devices using Arduino and other popular boards.


This guy found a really cool method with a flash cart (legal on its own) and a link cable.

https://youtu.be/rORPp6vYMsw


I have not personally used anything listed here but it seems promising.

https://gameboy.github.io/wiki/cartreaders

I believe the same or similar devices can be used to write cartridges with ROM hacks if one is so inclined.


>I already has a license for it

You only have a license for playing it and not a license for copying it.


In my jurisdiction (France) we have the droit à la copie privée for that (and we actually pay a tax to the local RIAA and MPAA equivalents on all storage hardware for that right) so yes I have that second license as well.


That wouldn't be considered a license and droit à la copie privée wouldn't apply to the site sending you the file.


> That wouldn't be considered a license

Would you mind expanding what you're thinking about here? (FWIW the ”license” concept doesn't really exists in French law)

> droit à la copie privée wouldn't apply to the site sending you the file.

Sure, the website would be illegal without doubt, but that's independent. Illegal streaming services are illegal as well, but watching them isn't for instance.

For a more similar example, there used to be a website that gave you the ability to “record TV programs in the cloud” (recording what's happening on TV is legal because is falls under the «copie privée» definition): the website was prosecuted and found guilty, but no customers were.


I believe you can purchase a GameBoy cartridge reader.


Just download it illegally it will be fine.


Yeah, seriously.


There is something unreasonably funny to me about the idea of someone getting carted off to jail for illegally downloading a ROM of a 27 year old game.

The person is timid about doing it, but everyone online was saying to just do it since everyone does and what is the worst that could happen. And then despite all of the very real crimes going on in the world, the police show up at this person's door.


The easiest solution is to simply google the ROM.

Edit: you edited your answer after I replied, so I'll edit mine too, it shouldn't be illegal to download the ROM if you physically own a copy. If you want to do it the hard way, buy a GB dumper.


Do you mean "it shouldn't be illegal" as in you wish it was not illegal or as in "it is probably not illegal"? Because in most jurisdictions (including the USA) it is illegal to download a backup ROM of something you own. Regarding making a personal backup, some jurisdictions are more permissive but some do not even allow it.


In by far most jurisdictions it is illegal to share the ROM, it is legal to download it.

You are in the clear if you download it from a ROM site (and own the corresponding cartridge). You are not in the clear if you use filesharing, unless you freeload.


> Because in most jurisdictions (including the USA) it is illegal to download a backup ROM of something you own.

Not a lawyer but I'm suspicious about that: first of all, I don't think downloading is actually illegal, it's either sharing stuff without license or keeping stuff without license that are generally illegal (which is why viewing movies from an illegal streaming website is legal in most places even though the website owner is clearly doing illegal stuff).

Then, if you have a license and the law in your jurisdiction allows personal backup, I think you'd have the right to actually store the thing, so I don't think you'd be breaking the law here.


As "it is probably not illegal", people always say "use the ROM of games you physically own" when talking about emulators, so I have imagine it would be simply legal.


I think that’s largely meant as some combination of a joke and an indemnification of the person saying it. I think it’s common knowledge that everyone who does this is using a file they downloaded from a site they found through Google.


> people always say "use the ROM of games you physically own”

People also say “No copyright infringement intended” while uploading full movies to YouTube. People, especially when it comes to matters of copyright, typically have very poor understandings of law...


So what would be the correct understanding of the law in this case? That using an emulator is illegal because you cannot own a copy of the games you own?


It varies based on country and jurisdiction, but from an American / European perspective - some parts of the emulation scene are “definitely not legal” (eg, distributing copyrighted ROMs), and most parts of the scene are “It is complicated. Well-informed people have different predictions, but it hasn’t been tested in court, so we don’t know for sure”.

Moon Channel (an actual lawyer) has a fairly clear and concise summary with particular relation to the Dolphin emulator’s attempt to get itself on Steam: https://www.youtube.com/watch?v=wROQUZDCIMI


Pokemon Red isn't a huge game if you have access to the state (not just pixels) and prune it properly. The RL algorithm used here is so sample inefficient that it's pretty much just a stupid search over the space. You can likely solve the game easier using traditional search algorithms. This shows how to use the wrong tool for the job. It would've been more interesting to use algorithms optimized for this kind of problem, not some off-the-shelf PPO that's just inappropriate.


It's not a job, it's a toy. This most likely was a learning experience or hobby for the author.


You can propably also just win it without any algorithm at all, just a specific finite number of inputs as demonstrated for firered here https://www.youtube.com/watch?v=6gjsAA_5Agk


> Pokemon Red isn't a huge game if you have access to the state (not just pixels) and prune it properly.

I think the whole point of this project is to let the AI do the pruning


To save people clicking around: they're using PPO for training and the gym is here https://github.com/PWhiddy/PokemonRedExperiments/blob/master...


I'm just wondering the impact of some rewards:

1) rewards for filling the Pokedex ( seen+ captured).

Could help with exploration and eg. Fish, Safari zone and swimming.

2) the reward per battle based on the duration of the battle + winning.

Could help with understanding the type system and the different attacks. Perhaps even with assigning TM items.

3) Reward on first usages of HM moves ( degrading reward). Could help with using fly and/or swim.


> By default this can use up to ~100G of RAM.

I don't like this AI revolution. Internet revolution was great. You could generate immense value with a computer that costed less than monthly salary. Almost the same with smartphone app revolution. But on this new frontier of AI you need some serious cash to even attempt making something worthwhile.


You could do certainly do it with much less, it would just take proportionally more time


i love that people experimenting in this space always take some time to visualize crazy scenes of many actors on screen.


There you go. Put an AI in a virtual world catching Pokemon. Next thing its in a robot body walking around catching humans, got to catch them all.


i didn't watch the whole video - does it eventually win the game? if so, how long does it take?

also, how do you not run out of memory?


Gets to Mt Moon and it gets stuck in a long hallway when the AI doesn't understand that it's still "exploring" by walking down it, even as it moves, because the screenshots look the same.

Plenty of ways to tweak the reward system to get it to continue.


You can beat pokemon just by pressing the a button repeatedly

https://www.youtube.com/watch?v=eZQtia9ZyyQ


Video shows one gym match where the player uses the d-pad in addition (as expected I guess, I already thought it a tall claim). This doesn't seem like the difficult part of the game to beat with only A indeed


If you want to watch their entire twitch stream this is part 1. I posted a highlight because most people aren't gonna watch a multi part recording of a twitch stream.

https://youtu.be/tgAgVwchKYE?si=XG3myxQDjdhijPv0


Hello




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: