Hacker News new | past | comments | ask | show | jobs | submit login

(I work on the Dota team at OpenAI.)

The output is a trained neural network!




Hi Gdb, next week I am giving a presentation on your awesome Dota work to the local data science community in vancouver BC. I have reviewed the info your team has released so far and i have a few questions:

- I saw no mention of CNNs, is it true CNNs are not used even for the 8x8 terrain grid input?

- do you have any comments about rapid+PPO vs say impala+vtrace? Would the ability to use more off-policy data be very helpful here?

- any comments on how you selected the reward constants?

- was the teamwork/tau something your team came up with or was this a known approach?

- the attention keys are most interesting, can you comment on why they dont flow through the lstm? Does it make it easier for the network to quickly change unit attention or some other reason?

- any comment of the choice of single-layer LSTM vs multilayer ostensibly for operating on longer timescales?

- does this result mean that HRL is less critical than some people thought?

- any comment on magnitude of compute, like in the post from may?

Thank you for sharing your fascinating work!


Could you go into some more detail on the actual engineering mechanics? Does each bot have an instance of the neural net model it runs a separate PC? How often do you feed game state into the net? What's the output of the network (bunch of movement / item / spell commands) that are fed in through the game driver?


Oh, good question, I didn't think of that either. there one NN that consumes the state for each of the bot players and then returns the "next action" for that bot, or is there a separate NN for each of the bots, and does that NN run on the LAN machine or is the LAN machine just running the game code and python agent which is mediating the game code and the NN?


I think OP wants to know how the neural network actually plays the game. I think in this case the dota client has an api for bots that it can use?


Yes, there's a bot API.

We dump state from the bot API each tick and send it over GRPC to a Python agent, which formats the state into a tuple of Numpy arrays. That Numpy array is passed into 5 neural networks (one per agent), each of which returns a tuple of Numpy arrays. Each tuple is decoded into a semantic action, which is then returned to the game via GRPC.


Is the entire system one agent, which is then replicated across 5 bot instances, or do you have a specific network per hero?


Does the entire NN run on the game laptop or is it passing the tuple back to OpenAI for processing?


NNs don't need much resources to run once trained. It's just a bunch of matrix multiplications.


So are 3d engines. It all depends on size of operands. Constant factors are important.


Are you seeing this? How can you let this go unchallenged?

https://twitter.com/eternalenvy1991/status/10196414446030520...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: