Exactly. Which is why it's so surprising that it did anyway despite that and discount rates which don't give any value past a minute or so.
> DeepMind was also able to train a CTF agent with human-level reaction time: https://deepmind.com/blog/capture-the-flag/
Note that the CTF agent is way more complex, featuring multilevel RL and evolutionary losses, and even DNC in the agents.
Exactly. Which is why it's so surprising that it did anyway despite that and discount rates which don't give any value past a minute or so.
> DeepMind was also able to train a CTF agent with human-level reaction time: https://deepmind.com/blog/capture-the-flag/
Note that the CTF agent is way more complex, featuring multilevel RL and evolutionary losses, and even DNC in the agents.