I suspect that as well, however, 3.5-turbo-instruct has been noted by other peop...

I suspect that as well, however, 3.5-turbo-instruct has been noted by other people to do much better at generating legal chess moves than the other models. https://github.com/adamkarvonen/chess_gpt_eval gave models "5 illegal moves before forced resignation of the round" and 3.5 had very few illegal moves, while 4 lost most games due to illegal moves.