OpenAI Five Benchmark

blueish · on July 18, 2018

I'm surprised they felt confident enough to lift a lot of the restrictions so soon. A lot of them seemed like big game changers I wouldn't expect the ai to be able to exploit so easily. Especially the introduction of Roshan and invisibility, they mention implementing some randomization, it seems like it would take quite a big commitment in terms of resources to take Roshan where the AI wouldn't even realize the benefits immediately (namely the Aegis, the XP and gold they would have to value against the loss from farming/taking towers).

The introduction of the other heroes also comes as a surprise, I wouldn't have expected them to have the ai utilizing new abilities. They don't mentioned how they are picked, other than the ai having a random draft of them (does the ai pick their composition?)

eduren · on July 18, 2018

I wonder if the reward mechanisms factor in denying resources from the enemy team. A lot of the time it makes sense to take Roshan just so the enemy doesn't get the opportunity to.

backpropaganda · on July 18, 2018

Yes, the reward for a team includes a term which is negative of the enemies rewards. This was mentioned in the previous blog post. Here it is: https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae939...

blueish · on July 18, 2018

I think that'd be an even harder behavior for the ai to come to though, since they'd have to recognize those benefits for the other team and their origin, and then finding that taking Roshan prevents those benefits. I think that's a next step after discovering the benefit of taking Roshan.

Anderkent · on July 19, 2018

Same argument applies to pushing and defending towers, getting rax etc. Assuming that the devs did not hardcode rewards for those objectives, then the AI surely already has to 'understand' events that impact the game in long term.

dx87 · on July 18, 2018

I think you're right about it being more about denying the enemy team. IIRC, in the last update they said that the bots tend to prioritize denying creeps and taking objectives over getting perfect last hits in lane.

minimaxir · on July 18, 2018

> We’ve increased the reaction time of OpenAI Five from 80ms to 200ms. This reaction time is much closer to human level, though we haven’t seen evidence of changes in gameplay as Openai Five’s strength comes more from teamwork and coordination than reflexes.

This new constraint is interesting. The Super Smash Bros. Melee AI paper noted that they had to keep the reaction times to superhuman levels in order for the model to converge (albeit DOTA is a bit different from Melee): https://arxiv.org/abs/1702.06230

vomjom · on July 18, 2018

The author of that paper has since built an agent that has human-level reaction time and is comparable against professional players: http://youtube.com/vladfi1

gwern · on July 18, 2018

Yes, but the fact that PPO can learn very long range strategies with enough computation when most would expect it to diverge/fail to learn at all is already demonstrated by the original 5x5 DoTA bot. That's probably the same thing there: it can handle the long-range learning and so does fine at human-level APM, while the SSBM AI is stuck learning short-term strategies which heavily rely on simple reactive fast policies.

vomjom · on July 18, 2018

There's nothing about PPO that helps it learn long-range strategies. It primarily lets you make multiple steps for a single batch so you can converge faster.

In fact, for a single step with no policy lag, it's equivalent to a standard policy gradient update.

DeepMind was also able to train a CTF agent with human-level reaction time: https://deepmind.com/blog/capture-the-flag/

I suspect the difference that allows you to train with reaction time is an RNN or compensating for the lag some other way. I'm testing that out right now with my own SSBM bot: https://www.twitch.tv/vomjom

gwern · on July 18, 2018

> There's nothing about PPO that helps it learn long-range strategies.

Exactly. Which is why it's so surprising that it did anyway despite that and discount rates which don't give any value past a minute or so.

> DeepMind was also able to train a CTF agent with human-level reaction time: https://deepmind.com/blog/capture-the-flag/

Note that the CTF agent is way more complex, featuring multilevel RL and evolutionary losses, and even DNC in the agents.

haeffin · on July 18, 2018

> Because our training system Rapid is very general, we were able to teach OpenAI Five many complex skills since June simply by integrating new features and randomizations. Many people pointed out that wards and Roshan were particularly important to include — and now we’ve done so. We’ve also increased the hero pool to 18 heroes. Many commenters thought these improvements would take another year.

The linked commenters thought that getting to "real dota" (more than 100 heroes, captains mode instead of random, ...) would take another year. So I don't think it's fair to make that statement.

Edit: Don't get me wrong, I think the improvements are very nice, but pointing to people saying "these people thought we would need a year, we did it in under a month!" is not what you should do if you didn't actually do what the linked people stated.

sincerely · on July 18, 2018

It seems like the 5 invulnerable couriers restriction is something that will have a huge influence on how the early game is played and is something that the humans won't have any experience taking advantage of.

joefkelley · on July 18, 2018

I don't think it's that complicated to adapt to... everyone should just be pretty much constantly ferrying out regen and harass more aggressively.

itchyouch · on July 18, 2018

people play with 5x invulnerable couriers in turbo mode. Essentially that is what they do. They get their items asap.

ufo · on July 18, 2018

And bottles are currently disabled so you can't use the most abusive strat that 5 couriers would allow.

acover · on July 18, 2018

I don't think you can bottle ferry anymore

gsich · on July 19, 2018

you can

acover · on July 19, 2018

Are you sure?

https://www.reddit.com/r/DotA2/comments/79uyd6/now_that_bott...

de_watcher · on July 19, 2018

You can't. It was changed around this year.

Courier isn't that important. It's being phased out across the recant patches. And there is a popular Dota mod with 5 fast invulnerable couriers.

gsich · on July 19, 2018

Courier is important. Otherwise you are forced to go back healing/getting items. Which will lose you XP and gold and ultimateley the game.

Yes there is Turbo, no it's not comparable to regular gameplay.

Anderkent · on July 19, 2018

Regular dota players learn to play without the courier anyway, because someone always feeds it away or uses it to ferry themselves a magic stick.

So maybe playing without a courier at all would be more representative of the pub experience ;)

gsich · on July 19, 2018

Must be 1k you are talking about.

star-trek-fleet · on July 18, 2018

Note: The choice of human opponent team is from people that are vastly better than the majority of Dota2 players, but still vastly worse than the top-tier pro teams.

So when next time Elon tweeted: "OpenAI beats the human players in 5v5"

You know that the game is not broken by AI yet (not like Go, which is indeed broken by AI).

hshehehjdjdjd · on July 19, 2018

I think from the trajectory it’s pretty obvious who will be on top in five years. Whether the intercept is now, a month from now, or half a year away doesn’t matter all that much.

wrsh07 · on July 19, 2018

If I'm understanding correctly, those five players are for the match in early August.

They'll still have a match against top pros at the International in late August.

inverse_pi · on July 18, 2018

Does anyone know how random drafting work? If they're truly random, i.e. randomly picking 5 heroes out of 18 (CM, DP, ES, Gyro, Lich, Lion, Necro, Qop, Razor, Riki, Nevermore, Slark, Sniper, Sven, Tide, Viper, or WD) then it's much less about teamwork. What if they end up with Razor, QOP, Nevermore, DP, Gyro? The problem here is in Dota, each hero almost has a clear position in the game, much like soccer. Having both teams randomly pick 5 heroes would probably ruin the game, and make it really difficult for human (think covariate shift), whereas the bot is probably trained using this distribution

backpropaganda · on July 18, 2018

From [1], the teams alternatively pick a hero from the pool. In this case the pool is not random, but fixed to the 18 heroes. Calling this Random Draft is confusing. Would be nice if they can confirm that the humans would have a choice in the heroes they play. Otherwise the larger hero pool is meaningless.

[1]: https://dota2.gamepedia.com/Game_modes#Random_Draft

mlinsey · on July 18, 2018

I assume they're referring to the existing game mode called Random Draft. The pool is usually 50 heroes instead of 18, but you'd run it the same way. Ordered picks like Captains Mode, except no bans.

chlvsl · on July 18, 2018

Sorry, is it really necessary to say nevermore instead of "sf"? I understand some of us have been playing since WC3 days and are used to the old names (wisp, necrolyte, nevermore) but it's actually longer to pronounce AND more confusing nowadays.

de_watcher · on July 19, 2018

That's Dota equivalent of the "man of culture".

Anderkent · on July 19, 2018

"POTM of the moon" is where it's at

gsich · on July 19, 2018

More confusing? No. Ingame voice lines also use the name on occasion.

sincerely · on July 19, 2018

How could you make the argument that it isnt more confusing to use non-dota2 names?

gsich · on July 19, 2018

Because it is not a non-dota2 name.

Hello71 · on July 18, 2018

That's called All Random.

inverse_pi · on July 18, 2018

Yes but no one plays serious in All Random mode. Supports are called supports for a reason, they're strong early game without a lot of items. Some gave early ganking, counter-ganking abilities, some have healing, harrasing abilities or really good early stats. Carries are stronger mid game and late game because naturally they're weaker early game. Because of this, you need to specify farming position, setup ganking, rotation, early game. All of these strategies will be gone if you play All Random. Basically it becomes a 2k pub trash game and no one > 4k mmr actually practices all random daily. Unless you're in SEA where people just first pick carries :) :) :)

inertiatic · on July 18, 2018

This will be very interesting to see, despite the vast differences in ruleset that make this much much less complex than the actual game.

My prediction is that we're very very far away from AI that can beat the top teams in a 5v5. Amateur teams can easily be beat simply on the strength of the mechanics (which are very very strong on the AI, beating even pros), but the strategy and coordination of the top teams are out of this world.

jononor · on July 18, 2018

Dare to put a time on that prediction? 1 year, 5?

ball_of_lint · on July 19, 2018

I would say 18 months from August 5 when they do the stream they will still be unable to beat a professional team playing the full game with no restrictions.

Right now they are so many key parts of the game: Illusions, Summons, Bottle, Courier, and most of the heroes. The 18 they have chosen are all fairly straightforward and make drafting simple. I want to see an AI playing Huskar, IO, and Natures Prophet. Better, I want to see an AI that can draft and ban.

unrealhoang · on July 19, 2018

Exactly what I think. The way the Pros exploit those heroes requires a lot of logic deduction, not just game sense feeling and tree searching (which current AI methods are strong at). Same thinking if we can combine all three of them, I think we will be very close to AGI.

wetpaws · on July 18, 2018

I really hope OpenAI teached bots to type "cyka" and "go mid"

joefkelley · on July 18, 2018

Don't forget safelane pos 1 feeding at minute 6 and typing "gg mid no gank"

levesque · on July 18, 2018

Agressive tipping is also of the utmost importance

Analemma_ · on July 18, 2018

Less restrictions is of course better, but I'm still not impressed without an actual Captain's Mode all-heroes draft. In addition inverse_pi's comments about all-heroes being vital because heroes have different roles to play, the draft is both an important part of the game and, I would think, one of the most difficult for an AI: it involves bluffing, mind games, online strategy adjustment in response to an opponent's actions, and awareness of the current meta.

The draft isn't everything and it's possible that a sufficiently talented AI could always lose the draft and still win the game, but that would be a pretty boring outcome from the perspective of contributing to AI knowledge (just like it's possible, though unlikely in Dota, that sufficiently good micro could overwhelm any disadvantage in strategy and tacitcs if the AI can play at 2000 APM: it would "win", but only in a very boring sense)

drexlspivey · on July 18, 2018

It can't really have 2000 APM because it observes 450 frames per minute.

Analemma_ · on July 18, 2018

I chose that number sort of arbitrarily to just mean "very very fast", but I see your point.

peripitea · on July 19, 2018

Furthermore, they are limiting the reaction time to 200ms, to match good humans (I suspect some pros are actually faster than that) and remove any advantage there. So it doesn't have a meaningful advantage over pros in any mechanical/reaction time sense; it's truly just trying to play the game more intelligently from what I can tell.

visarga · on July 19, 2018

Still has the advantage of simultaneously observing thousands of game variables from the API at a glance.

peripitea · on July 20, 2018

Ah yeah good point

de_watcher · on July 19, 2018

With that reaction time it's trying to pass the Dota-Turing's tests.

nopinsight · on July 18, 2018

Based on the chart showing the effect of "We're still fixing bugs" in their last blog post, it looks like they should have the skills 'buffer' to handle significantly better teams than those they have faced so far.

https://blog.openai.com/openai-five/

"We’re still fixing bugs. The chart shows a training run of the code that defeated amateur players, compared to a version where we simply fixed a number of bugs, ..."

Looking at the chart and the fact that they are confident enough to lift several restrictions, I'd bet on OpenAI Five winning against at least some of the professional teams at The International. It's even possible they will beat most teams there.

zhynn · on July 18, 2018

I am embarrassed to say that I am confused about what the outputs are from the "Rapid" RL training system. Do you end up with an executable that then drives the game inputs/api? Does it produce a "bot script" that is used by the game to drive the logic? I understand that thousands of CPUs/GPUs are used for the training, but then what is actually playing the game at the end of the day?

gdb · on July 18, 2018

(I work on the Dota team at OpenAI.)

The output is a trained neural network!

mooneater · on July 18, 2018

Hi Gdb, next week I am giving a presentation on your awesome Dota work to the local data science community in vancouver BC. I have reviewed the info your team has released so far and i have a few questions:

- I saw no mention of CNNs, is it true CNNs are not used even for the 8x8 terrain grid input?

- do you have any comments about rapid+PPO vs say impala+vtrace? Would the ability to use more off-policy data be very helpful here?

- any comments on how you selected the reward constants?

- was the teamwork/tau something your team came up with or was this a known approach?

- the attention keys are most interesting, can you comment on why they dont flow through the lstm? Does it make it easier for the network to quickly change unit attention or some other reason?

- any comment of the choice of single-layer LSTM vs multilayer ostensibly for operating on longer timescales?

- does this result mean that HRL is less critical than some people thought?

- any comment on magnitude of compute, like in the post from may?

Thank you for sharing your fascinating work!

tejaswiy · on July 18, 2018

Could you go into some more detail on the actual engineering mechanics? Does each bot have an instance of the neural net model it runs a separate PC? How often do you feed game state into the net? What's the output of the network (bunch of movement / item / spell commands) that are fed in through the game driver?

zhynn · on July 18, 2018

Oh, good question, I didn't think of that either. there one NN that consumes the state for each of the bot players and then returns the "next action" for that bot, or is there a separate NN for each of the bots, and does that NN run on the LAN machine or is the LAN machine just running the game code and python agent which is mediating the game code and the NN?

lightbyte · on July 18, 2018

I think OP wants to know how the neural network actually plays the game. I think in this case the dota client has an api for bots that it can use?

gdb · on July 18, 2018

Yes, there's a bot API.

We dump state from the bot API each tick and send it over GRPC to a Python agent, which formats the state into a tuple of Numpy arrays. That Numpy array is passed into 5 neural networks (one per agent), each of which returns a tuple of Numpy arrays. Each tuple is decoded into a semantic action, which is then returned to the game via GRPC.

chlvsl · on July 18, 2018

Is the entire system one agent, which is then replicated across 5 bot instances, or do you have a specific network per hero?

zhynn · on July 18, 2018

Does the entire NN run on the game laptop or is it passing the tuple back to OpenAI for processing?

drexlspivey · on July 18, 2018

NNs don't need much resources to run once trained. It's just a bunch of matrix multiplications.

baq · on July 19, 2018

So are 3d engines. It all depends on size of operands. Constant factors are important.

rishav_sharan · on July 19, 2018

Are you seeing this? How can you let this go unchallenged?

https://twitter.com/eternalenvy1991/status/10196414446030520...

bhl · on July 18, 2018

What is the purpose of having deep learning run on games like AlphaGo and DOTA2, instead of having them train on more general or real world tasks? Is it a constraint on the amount of data, since in video games you can easily generate more?

levesque · on July 18, 2018

The data generation is indeed one key aspect of it. To train a reinforcement learning model such as this one, you do need an insane amount of data (they wrote somewhere that the model played the equivalent of 180 years of Dota per day).

Overall, games are a good playground to test ideas and verify assumptions. The next step to transfer this type of knowledge to real world problems would be to build a simulator, train on it using ungodly amounts of computing resources, and then fine-tune the final model on the real world thing. This has been done for robot control tasks in the past. But first, you have to develop and prove that the base learning algorithm works -- and games are nice for that.

This here is also a good showcase of collaboration learned by RL agents, and beating pro teams in an esport where prize pools range in the millions of dollars is an amazing way to convince people.

m1el · on July 18, 2018

- You can't have thousands of years of real world tasks for low cost

- Clearly defined goal

visarga · on July 19, 2018

Training RL agents in the real world is expensive and thus not parallelizable. The current focus on games and VR simulations of robots is exactly because of this reason. The RL agents are much more "sample inefficient" than humans, meaning they need more experiences to learn a skill.

And we, humans (and animals) have a huge environment with billions of agents and millions of years of evolution behind us which allows us to come preloaded with good instincts, they are trying to replicate this process in a few months.

keepingscore · on July 18, 2018

How do 5 DotA players coordinate? They share information via voice?

How does a deep learning algorithm coordinate between 5 heros? I assume it's not 5 bots communicating over some channel but one bot acting on 5 heros?

supermdguy · on July 18, 2018

Surprisingly, it's 5 completely separate bots:

"OpenAI Five does not contain an explicit communication channel between the heroes’ neural networks. Teamwork is controlled by a hyperparameter we dubbed “team spirit”"

- https://blog.openai.com/openai-five/

sakarisson · on July 19, 2018

So the bots do not communicate directly?

Anderkent · on July 19, 2018

The bots presumably "learned" that the other heroes act the way that the bot would have acted were they in that position (i.e. all my allies run the same algorithm, so I can predict what they would do)

sullyj3 · on July 19, 2018

That seems like a pretty serious disadvantage

renwoshin · on July 18, 2018

How does counter-picking work if it's no longer mirror match? Was this a separate model?

Pxl_Buzzard · on July 18, 2018

According to the article the bots are only doing random picks from a pool of 18 heroes for now. I imagine pick/ban will come with later iterations.

sakarisson · on July 19, 2018

They will apparently be playing Random Draft mode, which normally means that it works similarly to All Pick, except for the fact that the players pick from a pool of 50 random heroes. How this will work with the 18-hero pool is something I don't know.

de_watcher · on July 19, 2018

It's not that clear. They call it "Random Draft", but this picking mode in Dota 2 doesn't mean "doing random picks". "Doing random picks" is the "All Random" mode.