PPO = Proximal Policy Optimization [https://openai.com/blog/openai-baselines-ppo...

cratermoon · on July 14, 2021

https://jonathan-hui.medium.com/rl-proximal-policy-optimizat...

politician · on July 15, 2021

The hero we need.

jdlyga · on July 14, 2021

Thank you!

Robotbeat · on July 14, 2021

Indeed. I looked for the definition in the whole webpage but couldn’t find it. Even Googling initially failed. https://arxiv.org/abs/1707.06347

throwaway81523 · on July 14, 2021

Yeah, did the same, then looked at the linked article. Its abstract:

     Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due the belief that on-policy methods are significantly less sample efficient than their off-policy counterparts in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in three popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency. Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO's practical performance.