Hacker News new | past | comments | ask | show | jobs | submit login

PPO = Proximal Policy Optimization

[https://openai.com/blog/openai-baselines-ppo/]





The hero we need.


Thank you!


Indeed. I looked for the definition in the whole webpage but couldn’t find it. Even Googling initially failed. https://arxiv.org/abs/1707.06347


Yeah, did the same, then looked at the linked article. Its abstract:

     Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due the belief that on-policy methods are significantly less sample efficient than their off-policy counterparts in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in three popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency. Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO's practical performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: