You're correct except for the use-cases where one of these come into play: A. La...

You're correct except for the use-cases where one of these come into play:

A. Latency: for some systems, you need near real-time predictions. LLMs (today) are slow for that.

B. Cost: when the low dev. effort (for building and deploying an ML model) and low sample complexity (i.e. zero/few-shot) doesn't translate into proportionate monetary gains over what you pay for LLM usage.

C. Precision: when you want the model to reliably tell you when it doesn't know the correct answer. Hallucination is a part of it - but I think of this requirement as the broader umbrella of good uncertainty quantification. I think there are two reasons why this is worse for LLMs: (1) traditional ML models also suffer from this, but there are some well known ways for mitigation. For LLM's there is still no universal or accepted way to perform this reliably (2) the quality of generated language an LLM produces seems to be more likely to deceive you when it is wrong. I don't know how to scientifically think about this - maybe as LLMs proliferate people would build appropriate mental defenses?

There is also the practical problem of prompt transferability across LLMs: what works best for one LLM might not work well for another, and there is no systematic way to optimally modify the original prompt. This is painful in some setups where you're looking to be not locked-in. But I didn't put it in the list because this seems to be a problem for niche groups - everyone seems to be busy in getting stuff working with one LLM. Maybe this will become a larger issue later.