Hacker News new | past | comments | ask | show | jobs | submit login

> We’ve increased the reaction time of OpenAI Five from 80ms to 200ms. This reaction time is much closer to human level, though we haven’t seen evidence of changes in gameplay as Openai Five’s strength comes more from teamwork and coordination than reflexes.

This new constraint is interesting. The Super Smash Bros. Melee AI paper noted that they had to keep the reaction times to superhuman levels in order for the model to converge (albeit DOTA is a bit different from Melee): https://arxiv.org/abs/1702.06230




The author of that paper has since built an agent that has human-level reaction time and is comparable against professional players: http://youtube.com/vladfi1


Yes, but the fact that PPO can learn very long range strategies with enough computation when most would expect it to diverge/fail to learn at all is already demonstrated by the original 5x5 DoTA bot. That's probably the same thing there: it can handle the long-range learning and so does fine at human-level APM, while the SSBM AI is stuck learning short-term strategies which heavily rely on simple reactive fast policies.


There's nothing about PPO that helps it learn long-range strategies. It primarily lets you make multiple steps for a single batch so you can converge faster.

In fact, for a single step with no policy lag, it's equivalent to a standard policy gradient update.

DeepMind was also able to train a CTF agent with human-level reaction time: https://deepmind.com/blog/capture-the-flag/

I suspect the difference that allows you to train with reaction time is an RNN or compensating for the lag some other way. I'm testing that out right now with my own SSBM bot: https://www.twitch.tv/vomjom


> There's nothing about PPO that helps it learn long-range strategies.

Exactly. Which is why it's so surprising that it did anyway despite that and discount rates which don't give any value past a minute or so.

> DeepMind was also able to train a CTF agent with human-level reaction time: https://deepmind.com/blog/capture-the-flag/

Note that the CTF agent is way more complex, featuring multilevel RL and evolutionary losses, and even DNC in the agents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: