Hi Gdb, next week I am giving a presentation on your awesome Dota work to the lo...

Hi Gdb, next week I am giving a presentation on your awesome Dota work to the local data science community in vancouver BC. I have reviewed the info your team has released so far and i have a few questions:

- I saw no mention of CNNs, is it true CNNs are not used even for the 8x8 terrain grid input?

- do you have any comments about rapid+PPO vs say impala+vtrace? Would the ability to use more off-policy data be very helpful here?

- any comments on how you selected the reward constants?

- was the teamwork/tau something your team came up with or was this a known approach?

- the attention keys are most interesting, can you comment on why they dont flow through the lstm? Does it make it easier for the network to quickly change unit attention or some other reason?

- any comment of the choice of single-layer LSTM vs multilayer ostensibly for operating on longer timescales?

- does this result mean that HRL is less critical than some people thought?

- any comment on magnitude of compute, like in the post from may?

Thank you for sharing your fascinating work!