From a black box perspective, LLMs are pretty simple, you put text or images in, (possibly structured) text comes out, maybe with some tool invocations.
If you use a good library for this, like Python's litellm for example, all it takes is changing one string in your code or config, as the library exposes most APIs of most providers under a simple, uniform interface.
You might need to modify your prompt and run some evals on whatever task your app is solving, but even large companies regularly deprecate old models and introduce vastly better ones, so you should have a pipeline for that anyway.
These models have very little "stickiness" or lock-in. If your app is a Twitter client and is built around the Twitter API, turning it into a Mastodon client built around the Mastodon API would take a lot of work. If your app uses Grok and is designed properly, switching over to a different model is so simple that it might be worth doing for half an hour during an outage.
Prompt to Output quality vary by a large amount between models IMO. The equivalent analogy would "lets switch programming language for this solved problem".
The models are still of a level where for less common/benchmarked tasks, there's often only one model that's very good at it, and whichever is 2nd best is markedly worse, possibly to a degree where it's unusable for anything serious.
From a black box perspective, LLMs are pretty simple, you put text or images in, (possibly structured) text comes out, maybe with some tool invocations.
If you use a good library for this, like Python's litellm for example, all it takes is changing one string in your code or config, as the library exposes most APIs of most providers under a simple, uniform interface.
You might need to modify your prompt and run some evals on whatever task your app is solving, but even large companies regularly deprecate old models and introduce vastly better ones, so you should have a pipeline for that anyway.
These models have very little "stickiness" or lock-in. If your app is a Twitter client and is built around the Twitter API, turning it into a Mastodon client built around the Mastodon API would take a lot of work. If your app uses Grok and is designed properly, switching over to a different model is so simple that it might be worth doing for half an hour during an outage.