I'd love for someone to do a good quality PyTorch enabled implementation of Sampled AlphaZero/MuZero [1]. RLLib has an AlphaZero, but it doesn't have the parallelized MCTS you really want to have and the "Sampled" part is another twist to it. It does implement a single player variant though, which I needed. This would be amazing for applying MCTS based RL to various hard combinatorial optimization problems. Case in point, AlphaTensor uses their internal implementation of Sampled AlphaZero.
An initial implementation might be doable in 5 hours for someone competent and familiar with RLLib's APIs, but could take much longer to really polish.
An initial implementation might be doable in 5 hours for someone competent and familiar with RLLib's APIs, but could take much longer to really polish.
[1]: https://arxiv.org/abs/2104.06303