There is an expanding field of study looking at machine learning with statistical physics tools. While there is still a lot of work to do in this area, it yields interesting insights on neural networks, e.g. linking their training with the evolution of spin glasses (a typical statistical physics problem). We can even talk about phase transition and universal exponents.
Most of the research is done with simpler models though (because mainly math people do it, and it's hard to prove anything on something as complex as a transformer).
Most of the research is done with simpler models though (because mainly math people do it, and it's hard to prove anything on something as complex as a transformer).