Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI Retro Contest (contest.openai.com)
197 points by gdb on April 5, 2018 | hide | past | favorite | 48 comments



Two months to advance the state of the art on a complex physics-based game with branching paths. Current approaches such as DQNs or god forbid DRL[1] barely reach the performance of my three year old cousin in atari game score maximization and are mostly non-transferable to new levels... Good luck.

[1] https://www.alexirpan.com/2018/02/14/rl-hard.html


Of course it's a difficult problem, but why try and dissuade people from attempting it? Nothing novel ever came from somebody who didn't fail first : )


I don't consider it dissuasion, but a fair warning & proposition preparation. Seriously: Good luck!


> Current approaches such as DQNs or god forbid DRL

You say that like you're talking about two separate things. DQN is just one type of DRL.


This should be required reading for anyone wanting to do this contest.


Do note that most RL methods including DQN were designed to be very general. While this contest doesn't really require generality. You could design a model specialized to the characteristics of the game. This is what the top winners of the Vizdoom challenge did.


> Two months ... Good luck

Well, it needs to be 2 months so Musk can transfer this learning to Autopilot... /s, I think


I'm a big retro Sega fan, and I've always wanted to look into doing something like this, but this seems... really difficult. Would the best approach be to jump right in and hope for the best, or are there any sources I should look into?


If you want to do ok and understand what's up, watch the awesome free lectures by Alphabet's DeepMind here (the people at the origin of AlphaZero): https://www.youtube.com/playlist?list=PLweqsIcZJac7PfiyYMvYi...


IMO, the best approach would be to jump right in, without looking into what other people are doing. People have tried to solve these problems using the current trend (deep reinforcement learning), and so far it has not worked out. Most such contests are usually won by methods which are somewhat different from the current paradigm. For instance, Imagenet 2012, and to some extent, the Vizdoom challenge was won by a method which departed significantly from the paradigm of the time.

If you have the time to work on this contest, I'd recommend trying a method which is either not deep learning or not reinforcement learning or not both. There will be enough submissions from people who will try obvious ideas, and so you're better off trying something unique. Good luck and have fun!


ImageNet has been won over and over very much by improvements on "what other people were doing" for the past 6 years, with incremental improvements on convnets.

ConvNets were invented by Yann Lecun, a professor of machine learning who had spent his life investigating these kinds of subjects.

What I'm saying is, machine learning is super interesting. You need to really be ready invest yourself though.


The demo GIFs show Sonic 1 and 3, but all 3 Genesis games have slightly different physics and mechanics which could trip up an AI if not trained on all 3 games. Is the challenge just using the Sonic 1 engine?


The challenge is using all 3 Genesis games.


Yes, just found it in the details: https://contest.openai.com/details

From the downloaded CSV of game states, all stages from the 3 games are valid training stages, and the custom stages used for testing are derived from all 3 games. Yikes.

I really hope those custom stages use the same art assets as the original games.


Every test run could theoretically start with a jump or some other behavior to exercise the different physics or whatever among the 3 games, and use a discriminator network to feed into one of three networks individually trained on each game?


The whole point of the OpenAI exercise is to improve transfer learning- a winning project should be able to work off of a single model.

For an example of a system that does learn new rules using a single model check out this post from Vicarious[0].

[0] https://www.vicarious.com/2017/08/07/general-game-playing-wi...


I'd imagine part of any reinforcement learning approach would be randomly tweaking the physics for each run, to make it robust against small changes and errors. Handling the different games (assuming they're similar enough) should just be an extension of that.


I'm wondering if there's a MOOC that takes you from zero to being able to build a system that learns how to play such a game, maybe focusing a bit less on the math (especially the proofs). I took Andrew Ng's course on Coursera but I feel like the gap between what I know and what's needed for this contest is huge. Am I wrong?


No, you're not wrong. That said, I don't think this is a competition aimed at people with a beginner level knowledge of machine/deep learning. I don't believe there is currently any MOOC that is going to hold your hand from 0 to expert. Given the current state of the field, I believe the only surefire path towards this level of knowledge is either university education or a /lot/ of reading.


Anyone have an intuition to the next step after parent?

Acknowledge its a hugely complicated subject.


Seems like a stretch to call this “transfer learning”. Maybe training on Sonic and testing on Mario.

Would be cool to see some kind of adversarial competition. You train to, say, beat a game level but you test to beat someone else’s submission. (Short on the specifics, I know.)


Transfer learning is what makes this contest currently unique. Usually in these machine learning contests, you hand off your model to the contest runner, and they just run it and evaluate its performance. It has to be already good at the task.

In this contest, you're not just submitting a trained model, you're submitted a docker environment capable of performing training, which they will run with their secret levels, with specified constraints. So you want to make a highly trained model whose capabilities can be transferred to secret levels.


Yeah, any side scroller...so dying is bad, but getting hit and shrinking or losing all your rings is not as bad, killing enemies is good, and collecting a power-up is good, but only when it matters.

Anything else?

Like BDD for AI.


I think some of the "transfer" part comes from the fact that there are actually three different games involved (Sonic 1, 2 and 3).


Cynical prediction one: Nothing learned here will be readily transferrable to another domain. :(


I expect training & testing to make use of emulators & ROMs - wouldn't Sega potentially have a problem with this or is it considered fair use?


The loophole suggested in the instructions is to use the ROMs provided by legally-purchased copies of the game from Steam.


Can we move away from video games ? I know games provide a closed loop for these kind of experiments/contests, but I want to see more practical application of RL. How about code generation that create programs based on the test cases or RL agent that can use a 3d design software.


The problem is that reinforcement learning is far from solved and doesn't work all that well yet, so these toy problems are probably what researchers will stick with for some time to come.


Video games are a great environment because they are quite complex and they are already coded and we understand them. For basic research it doesn't matter whether is it a game, test cases or 3d design. They use games because it is less work as they already work and are easy to integrante. Moreover each game is a different domain.


Couldn't a new level contain some sort of new scenario that would be completely impossible to navigate based on previous experience without some very general AI?


What’s the input to the agent? Just all the pixels on the screen?


> Each timestep advances the game by 4 frames, and each observation is the pixels on the screen for the current frame, a shape [224, 320, 3] array of uint8 values. Each action is which buttons to hold down until the next frame (a shape [12] array of bool values, one for each button on the Genesis controller where invalid button combinations (Up+Down or accessing the start menu) are ignored ... During training you can access a few variables from the memory of the game through the info dictionary. During testing, these variables are not available.

So yep, the RGB image.


Is this a fair representation of a human playing a game? Usually as a human player I don't go into a level expecting to clear it never having seen it before.


It's fair because you get better at gaming in general when you play games. For instance, your performance on Mario will be better after playing a lot of Sonic.

This is in contrast to current models which generally need to be trained up from scratch on each new game, even if the games are mechanically similar.


I think maybe a reasonable choice would be to give the ai the ability to store some limited amount of information, let's say 4kb, and it gets to populate this on a first run and draw from it on a second run. Second run scores are compared.


That's what makes this contest interesting, they're not going to just evaluate a static agent, you submit a docker environment capable of performing learning, and they'll let your agent learn on their secret levels, on their servers.

That provides a rather unique challenge, of making a good general baseline agent, but one that also is flexible for further training.


It's Sonic, clearing it is really easy for humans.


I couldn't find - high score or fastest time? Totally different skills.


Best score

Check out https://contest.openai.com/details

I'm not sure how different those skills are from an AI development point of view though, why do you think they are?


If I'm reading that right they're defining score in terms of time so it's actually the same?


Ugh, you're mostly right, I skimmed too quickly.

> The reward your agent receives is proportional to its progress to the predefined horizontal offset within each level, positive for getting closer, negative for getting further away. If you reach the offset, the sum of your rewards will be 9000. In addition there is a time bonus that starts at 1000 and decreases learning to 0 at the end of the time limit, so beating the level as quickly as possible is rewarded.

So mostly time - but past a certain threshold just completion. And if you don't finish "portion completed".


Well if you get to the time limit then it ends wherever you are, so this is literally just how long it took to complete, or how far you got if you did not complete. Completing gives you 9000 points and the max time bonus is 1000 so it heavily incentivises winning consistently versus higher variance strategies that may be quicker.


Do you win anything?


Second to last paragraph:

  The contest will run from April 5 to June 5 (2 months)
  and *winners will receive some pretty cool trophies.*


> winners will receive some pretty cool trophies.


Was this just released today?


Yep!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: