Visual Doom AI Competition

fdej · on April 22, 2016

I'd like to see a single-player bot that can do human-level speedruns and/or beat stuff like https://www.twitch.tv/blooditekrypto/v/30795033

Baby steps. Beating other bots in deathmatch is a good start. I love that they only use the rocket launcher, giving the careless bot an equal chance of blowing itself up.

Parsing what's on the screen in Doom is potentially a lot easier than in modern games: since there is no texture filtering or anti-aliasing, and due to the 2.5D perspective, most vertical runs of pixels on the screen map exactly to (linearly?) scaled columns in wall textures or sprites. I would not be surprised if you could come up with a fairly simple algorithm to determine the exact position and orientation of the player and the objects on screen within the map, without any real AI/learning involved.

hacker_9 · on April 22, 2016

How do you figure? The doom maps are almost all completely dull looking and gray, aren't lit realistically (so depth from shading becomes harder to determine), and because of no anti-aliasing, lines are far more jagged (meaning the computer needs to be smarter, in order to join disparate lines together it 'thinks' are connected). There's barely any distinguishing features separating wall boundaries from texture details either. I think you might be underestimating the challenge here.

Kristine1975 · on April 22, 2016

> The doom maps are almost all completely dull looking and gray

I will fight you ;-)

But seriously: The project uses a source port of Doom[1] based on ZDoom[2] which supports high screen and texture resolutions, mouse look, colored lights etc.

Although I think that since Doom has less visual clutter (bloom, shadows, ambient occlusion, screen distortion) than a modern AAA game its display should be easier to read.

[1] https://github.com/Marqt/ViZDoom [2] http://zdoom.org/About

nothis · on April 22, 2016

How is an anti-aliased line easier to read for a computer? It should be fairly easy to find lines and use those to get vanishing points for orientation. Distance of enemies should be more or less a mapping of screen height, since you're mostly looking straight-forward. It sounds super doable, especially with machine learning becoming a thing, more or less delegating the task of putting all those inputs together to make sense of it to the computer.

gene-h · on April 22, 2016

Well you have more unique features from which you can do localization on.

deepnet · on April 22, 2016

VizDoom provides access to the depth buffer.

https://github.com/Marqt/ViZDoom

Kristine1975 · on April 22, 2016

Related: https://www.newscientist.com/article/2076552-google-deepmind...

Google DeepMind AI navigates a Doom-like 3D maze just by looking

Paper: http://arxiv.org/abs/1602.01783

Nr7 · on April 22, 2016

Google cache: http://webcache.googleusercontent.com/search?q=cache:bVb0ETV...

owenwealro · on April 22, 2016

Interesting article and poses the question and what I would theorize is the correct way for A.I. to learn how to be more human or machine learn is in a virtual environment. i.e. create a 3D based game to teach it how to interact in the physical world in order for robotic A.I. advancement and human interaction. Similar to Google Drive then test on the road in real-life.

To the point on visual processing a stealth A.I. (not for long) working in this space is Magic Pony Technology (http://www.magicpony.technology/) operating from London and guys I spotted at London A.I. who have previously hosted Prediction IO and Swiftkey at their last events.

Another caveat for this test is sound, human players will have audio but the A.I. will be purely visual so a slight disadvantage.

We are working on speech-to-text and text-to-speech technology for our A.I. voice enabled finance assistant which is in beta at WealRo (http://www.wealro.com) with a view to enabling a visual face recognition at some point in order for the A.I. to gather information from facial expressions. Always happy to get any thoughts on the potential usefulness of such an integration?

chrisan · on April 22, 2016

> Another caveat for this test is sound, human players will have audio but the A.I. will be purely visual so a slight disadvantage.

This project says it only uses the screen buffer, but perhaps (and this is beyond my low level knowledge) there is also an equivalent "sound buffer" where it could tap into left and right channels?

jcauchy · on April 24, 2016

No, there is no sound buffer. It is an interesting extension to our project but I think the visual information is significantly more important.

yorwba · on April 22, 2016

The site seems to have moved to http://www.cs.put.poznan.pl/visualdoomai/competition-cig-201...

logicrook · on April 22, 2016

The question seems flawed, as having an AI making decisions only based on visual information is basically confusing how you get the information (visual) and what information the AI get (only limited information, similar to what the player has). Two different problems that can be solved completely independently. The first problem makes no sense for a game (how intensive would be the computation), while the latter one could be very interesting, since it would rely on designing the AI more like a natural player. The catch however is that it's a "could", in itself there is no reason to imagine such AIs would make the game better in any way (over "cheating" AI's).

argonaut · on April 22, 2016

Your argument is very unclear, and your last sentence makes no sense. What exactly is the problem with jointly learning perception and control? They are widely considered to be intertwined problems in robotics. Not independent at all. But for what it's worth, you will find researchers working on all sorts of different approaches.

Clearly this competition is an attempt to build off successes in reinforcement learning with agents that play games using only images and scores.

logicallee · on April 22, 2016

I don't know if your parent comment is correct, but their argument is really easy to follow. I'll put what I think their argument is in different words.

"this contest is like making Google make Alphago have to also include a robot and image recognition and making the robot have to place the stones." obviously that has "nothing to do with" the game and is "how Alphago gets the information."

But Go (and tetris etc) are games of perfect information where perception of the game state is not a challenge. If you have access to the internal data structure representing the Go or tetris board that's the same as having to scrape it off of a screen and recognize it or do real world image recognition.

If your parent comment is wrong it's because that's not the kind of game Doom is.

So what you consider "intertwined" really isn't, unless Google has not even built a go engine, since a human was doing the perception.

(again, I am just saying your parent's argument is easy to follow, not that they're correct in this case.)

argonaut · on April 23, 2016

I got that part. Neither you nor the original poster gave a good reason why that's a bad idea. After all, "end to end" learning is the holy grail of AI.

The only reason people have historically (and many currently) separate the two is because it's been too hard! But that's the point of research: solving hard problems.

logicrook · on April 22, 2016

Thank you, for the excellent rephrasing.

>But Go (and tetris etc) are games of perfect information where perception of the game state is not a challenge.

In general, "perception of the game state" is not a challenge, at least according to good game design principles (e.g. in danmaku shmups, perception can be a challenge because of visual effects that are not really part of the game, but this is seen as poor game design, similarly to how being unable to differentiate backgrounds from platforms in a run&jump is bad design). Although there are games where the perception of the game state is a game mechanic, but Doom isn't really one.

But even in Doom, you can separate quite neatly the two tasks. The vision task essentially aims to reconstruct a model of the world. But in a video-game, this model comes for free. You can trivially limit the information an agent get to what he would get as a player (in games like MGS it's already the case, albeit in a very simplistic way). It's fairly easy to make a function that computes what is visible, what sounds a player would hear, etc. You can then rephrase the problem as make an AI that can only access this function, and this wouldn't change anything.

So for the AI community, I think a more interesting question would have been to design an AI over such a function.

eutectic · on April 22, 2016

If the contestants are using deep learning, I don't see why it should be any more difficult to generate a meaningful, low-dimensional representation of the game-state from raw pixels than from an abstract view input.

elementalest · on April 22, 2016

The data the agent perceives is going to be very different from what would be attained from access to dooms internals, as a 'normal' AI would have. The challenge is to bridge the image recognition and intelligence together efficiently and effectively.

Of course it does not make sense to use in a real game implementation. The idea is that such technology could (with further development) be used in real world scenarios/applications. The problem is simply posed in a game environment to make it interesting and easier to approach.

JabavuAdams · on April 22, 2016

Think about players hiding behind foliage. The AI cheats that are used now make this a lot less practical / fun than a per-pixel visibility test.

Joof · on April 22, 2016

This sounds fun. I hope they stream some of it on twitch as well.

deepnet · on April 22, 2016

This looks great fun, I am working on a convnet to play Doom.

brudgers · on April 22, 2016

[Content after page loading]

Motivation

Doom has been considered one of the most influential titles in the game industry since it popularized the first-person shooter (FPS) genre and pioneered immersive 3D graphics. Even though more than 20 years have passed since Doom’s release, the methods for developing AI bots have not improved significantly in newer FPS productions. In particular, bots have still to “cheat” by accessing game’s internal data such as maps, locations of objects and positions of (player or non-player) characters. In contrast, a human can play FPS games using a computer screen as the only source of information. Can AI effectively play Doom using only raw visual input? Goal

The participants of the Visual Doom AI competition are supposed to submit a controller (C++, Python, or Java) that plays Doom. The provided software gives a real-time access to the screen buffer as the only information the agent can base its decision on. The winner of the competition will be chosen in a deathmatch tournament. Machine Learning

Although the participants are allowed to use any technique to develop a controller, the design and efficiency of the Visual Doom AI environment allows and encourages participants to use machine learning methods such as reinforcement deep learning. Competition Tracks 1. Limited deathmatch on a known map.

The only available weapon is the Rocket Launcher, with which the agents start. The agents can also gather Medikits and ammo. 2. Full deathmatch on an unknown map.

Different weapons and items are available. Two maps are provided for training. The final evaluation will take place on three maps unknown to the participants beforehand. Important Dates

    31.05.2016: Warm-up deathmatch submission deadline
    15.08.2016: Final deathmatch submission deadline
    20-23.09.2016: Results announcement (CIG)

Contact

    For announcements and questions subscribe to vizdoom@googlegroups.com
    Bugs: Open a new GitHub ticket

Getting started

    Download (or compile) the ViZDoom environment.
    Follow the instructions.

What will the Deathmatch Look Like?

Your controller will fight against all other controllers for 10 minutes on a single map. Each game will be repeated 12 times for track 1 and 4 times for track 2, which involves three maps. The controllers will be ranked by the number of frags.

            In the case of lots of submissions, we will introduce some eliminations.

Technical Information

    Each controller will be executed on a separate machine having a single CPU and GPU at its only disposal.
    The machine specification: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz + GTX 960 4GB
    Operating system: Windows or Ubuntu Linux 15.04

How to Submit my Entry?

To accept your submission we will need the following data:

    name of the team
    team members and their affiliations
    max. 2 pages description of the method used to create the controller (pdf)
    a list of (sensible) software requirements for the agent to run (ask beforehand)
    a link to the source code of your controller and additional files (max 1GB in total)
    an instruction how to build and execute the controller

The form to submit the above data will be provided later.

            In the spirit of open science, all submissions will be published on this website after the competition is finished.

Organizers

Wojciech Jaśkowski, Michał Kempka, Marek Wydmuch, Jakub Toczek

banach · on April 22, 2016

So they are trying to make it learn how to go on a killing spree. Doesn't sound like such a great plan.

Kristine1975 · on April 22, 2016

Replace the sprites, and instead of slaughtering enemies with the chainsaw, the AI heals wounded civilians with a medikit.

Doom is similar to the Lenna picture in image processing: Somewhat prurient, but well-known. See for example psDooM, where you kill Linux processes: http://psdoom.sourceforge.net/

P.S: Who will teach the AI cheat codes such as idchoppers?