Hacker News new | past | comments | ask | show | jobs | submit login

In what circumstances is it only 10x slower? Random search is totally useless when your environment is stochastic. These algorithms aren't learning sequences of actions, in fact most use a 30 'no op' random start to avoid just that.



By 'random', he means evolutionary search. It's not really random, and is just a slower method for policy gradient. Here's the OpenAI blog post: https://blog.openai.com/evolution-strategies/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: