I think training this as a separate head on top of a frozen AlphaZero model makes a lot of sense.
I don't think anyone has figured out to do language learning with reinforcement training.
Actually, I can't figure out from your explanation why you trained the whole network yourself instead of just using Leela's network and training the commentary head on top?
If you wanted to in-cooperate the search, maybe you could just take the 1800 or so probabilities output by the MCTS and add some layers on top of that before concatenating with the other data fed into the transformer.
In either case, this is a fantastic project and perhaps an even more impressive write up! Congrats and thank you!
It was partly because I was looking to improve self-play and training tractability on a home desktop with 1 GPU (complete failure), and partly to learn about everything from scratch. I would be interested to see how strong it is with the same search but with Leela's inference backend (for GPU at least) and network.
In terms of search-into-commentary, concatenating like that may be interesting, as long as it can learn to map across - definitely plausible without too much work. I was originally thinking of something more complicated, combining multiple raw network outputs across the tree through some kind of trained weighting, or additional model via recurrence, and punted it.
Ignore my BLEU comment, mixed those up between replies - that was the other potential use of search trees for commentary, an MCTS/PUCT-style alternative to traditional sequential top-k/top-p sampling, once you have logits and are deciding which paragraph to generate.
Actually, I can't figure out from your explanation why you trained the whole network yourself instead of just using Leela's network and training the commentary head on top?
If you wanted to in-cooperate the search, maybe you could just take the 1800 or so probabilities output by the MCTS and add some layers on top of that before concatenating with the other data fed into the transformer.
In either case, this is a fantastic project and perhaps an even more impressive write up! Congrats and thank you!