Two months to advance the state of the art on a complex physics-based game with branching paths. Current approaches such as DQNs or god forbid DRL[1] barely reach the performance of my three year old cousin in atari game score maximization and are mostly non-transferable to new levels... Good luck.
Of course it's a difficult problem, but why try and dissuade people from attempting it? Nothing novel ever came from somebody who didn't fail first : )
Do note that most RL methods including DQN were designed to be very general. While this contest doesn't really require generality. You could design a model specialized to the characteristics of the game. This is what the top winners of the Vizdoom challenge did.
I'm a big retro Sega fan, and I've always wanted to look into doing something like this, but this seems... really difficult. Would the best approach be to jump right in and hope for the best, or are there any sources I should look into?
IMO, the best approach would be to jump right in, without looking into what other people are doing. People have tried to solve these problems using the current trend (deep reinforcement learning), and so far it has not worked out. Most such contests are usually won by methods which are somewhat different from the current paradigm. For instance, Imagenet 2012, and to some extent, the Vizdoom challenge was won by a method which departed significantly from the paradigm of the time.
If you have the time to work on this contest, I'd recommend trying a method which is either not deep learning or not reinforcement learning or not both. There will be enough submissions from people who will try obvious ideas, and so you're better off trying something unique. Good luck and have fun!
ImageNet has been won over and over very much by improvements on "what other people were doing" for the past 6 years, with incremental improvements on convnets.
ConvNets were invented by Yann Lecun, a professor of machine learning who had spent his life investigating these kinds of subjects.
What I'm saying is, machine learning is super interesting. You need to really be ready invest yourself though.
The demo GIFs show Sonic 1 and 3, but all 3 Genesis games have slightly different physics and mechanics which could trip up an AI if not trained on all 3 games. Is the challenge just using the Sonic 1 engine?
From the downloaded CSV of game states, all stages from the 3 games are valid training stages, and the custom stages used for testing are derived from all 3 games. Yikes.
I really hope those custom stages use the same art assets as the original games.
Every test run could theoretically start with a jump or some other behavior to exercise the different physics or whatever among the 3 games, and use a discriminator network to feed into one of three networks individually trained on each game?
I'd imagine part of any reinforcement learning approach would be randomly tweaking the physics for each run, to make it robust against small changes and errors. Handling the different games (assuming they're similar enough) should just be an extension of that.
I'm wondering if there's a MOOC that takes you from zero to being able to build a system that learns how to play such a game, maybe focusing a bit less on the math (especially the proofs). I took Andrew Ng's course on Coursera but I feel like the gap between what I know and what's needed for this contest is huge. Am I wrong?
No, you're not wrong. That said, I don't think this is a competition aimed at people with a beginner level knowledge of machine/deep learning. I don't believe there is currently any MOOC that is going to hold your hand from 0 to expert. Given the current state of the field, I believe the only surefire path towards this level of knowledge is either university education or a /lot/ of reading.
Seems like a stretch to call this “transfer learning”. Maybe training on Sonic and testing on Mario.
Would be cool to see some kind of adversarial competition. You train to, say, beat a game level but you test to beat someone else’s submission. (Short on the specifics, I know.)
Transfer learning is what makes this contest currently unique. Usually in these machine learning contests, you hand off your model to the contest runner, and they just run it and evaluate its performance. It has to be already good at the task.
In this contest, you're not just submitting a trained model, you're submitted a docker environment capable of performing training, which they will run with their secret levels, with specified constraints. So you want to make a highly trained model whose capabilities can be transferred to secret levels.
Yeah, any side scroller...so dying is bad, but getting hit and shrinking or losing all your rings is not as bad, killing enemies is good, and collecting a power-up is good, but only when it matters.
Can we move away from video games ? I know games provide a closed loop for these kind of experiments/contests, but I want to see more practical application of RL. How about code generation that create programs based on the test cases or RL agent that can use a 3d design software.
The problem is that reinforcement learning is far from solved and doesn't work all that well yet, so these toy problems are probably what researchers will stick with for some time to come.
Video games are a great environment because they are quite complex and they are already coded and we understand them. For basic research it doesn't matter whether is it a game, test cases or 3d design. They use games because it is less work as they already work and are easy to integrante. Moreover each game is a different domain.
Couldn't a new level contain some sort of new scenario that would be completely impossible to navigate based on previous experience without some very general AI?
> Each timestep advances the game by 4 frames, and each observation is the pixels on the screen for the current frame, a shape [224, 320, 3] array of uint8 values. Each action is which buttons to hold down until the next frame (a shape [12] array of bool values, one for each button on the Genesis controller where invalid button combinations (Up+Down or accessing the start menu) are ignored ... During training you can access a few variables from the memory of the game through the info dictionary. During testing, these variables are not available.
Is this a fair representation of a human playing a game? Usually as a human player I don't go into a level expecting to clear it never having seen it before.
It's fair because you get better at gaming in general when you play games. For instance, your performance on Mario will be better after playing a lot of Sonic.
This is in contrast to current models which generally need to be trained up from scratch on each new game, even if the games are mechanically similar.
I think maybe a reasonable choice would be to give the ai the ability to store some limited amount of information, let's say 4kb, and it gets to populate this on a first run and draw from it on a second run. Second run scores are compared.
That's what makes this contest interesting, they're not going to just evaluate a static agent, you submit a docker environment capable of performing learning, and they'll let your agent learn on their secret levels, on their servers.
That provides a rather unique challenge, of making a good general baseline agent, but one that also is flexible for further training.
> The reward your agent receives is proportional to its progress to the predefined horizontal offset within each level, positive for getting closer, negative for getting further away. If you reach the offset, the sum of your rewards will be 9000. In addition there is a time bonus that starts at 1000 and decreases learning to 0 at the end of the time limit, so beating the level as quickly as possible is rewarded.
So mostly time - but past a certain threshold just completion. And if you don't finish "portion completed".
Well if you get to the time limit then it ends wherever you are, so this is literally just how long it took to complete, or how far you got if you did not complete. Completing gives you 9000 points and the max time bonus is 1000 so it heavily incentivises winning consistently versus higher variance strategies that may be quicker.
[1] https://www.alexirpan.com/2018/02/14/rl-hard.html