You could possibly fine tune it for narrow domain tasks like they did with tiny-...

You could possibly fine tune it for narrow domain tasks like they did with tiny-agent https://bair.berkeley.edu/blog/2024/05/29/tiny-agent/

I like the approach that Apple seems to be taking with fine tuned small models that handle routine tasks and then defer to larger off device models for things they can’t confidently do. I imagine you could construct a training set that contains examples that should produce low confidence answers where you could add an output that is essentially a “call for help” option so you could train it to choose that. Smaller models also means you could have more running in parallel and use another to route requests to the appropriate expert.