I would start with David Silvers (DeepMind) youtube series to get an idea of wha...

I would start with David Silvers (DeepMind) youtube series to get an idea of what's possible or not.

Running an already trained reinforcement learning agent is relatively cheap (unless your model is massive).

I suspect the reason people aren't using it yet is because it's a) really difficult to get right in training, even basic convergence is not guaranteed without careful tuning b) really difficult to guarantee reasonable behavior outside of the scenarios you're able to reach in QA.

edit: Link to lecture series https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTra...