You can run them either for general purpose inference. You can also fine-tune them and get improved performance for specific use cases.
It's safe to assume they're worse at every task than larger models, so I wouldn't look at use cases in terms of what tasks they can do compared to larger models.
But what's good about them is they're smaller so they can run on smaller and cheaper hardware. So an example would be to fine-tune and then run on some sort of local user device rather than in the cloud. This might become more practical in the future as hardware improves.
say you had very vertical trained models, such that you had like 1000 separate LLMs trained on specialized data and then others LLMs trained on which LLM is most likely to have the data you need, sort of like the way Wikipedia is interlinked, or hierarchical, or essentially like a db index, over nested LLMs, performance would scale higher with many more highly focused models, at least that's my understanding of what possible here.