The future is using the best possible tool to drive your work. Won’t local models be systematically inferior to bigger commercial offerings for the next few years at least?
"The future is using the best possible tool to drive your work"
Not if that tool is censored, and you need an uncensored version to do your work. Or maybe you have privacy considerations, or your company policies forbid using something hosted remotely or owned by another company, etc...
Maybe. I wonder if very narrow, multi-model systems might eventually deliver better performance and utility than monolithic models like GPT. Rather than pay for access to that, you might be better off investing in resources that can train and learn on exactly what you're doing, rather than something general that is good at a lot of things but not incredible at your specific task.
Generally we have continued finding that the more "other"/general stuff an AI model is trained on, the better it performs on specific tasks. As in, an AI model trained to identify photos of all animals will perform better than an AI model that is only trained to identify breeds of dogs. Even at identifying breeds of dogs.
Taken to the extreme, we've found that training image models with "multi-modal" LLM capabilities improves their ability to identify dogs/etc. A lot of people don't realize that GPT-4 is actually multi-modal...while OpenAI has only allowed API access to use text input, the model itself can also accept image input.
Note that we've moved on from ImageNet-style tests "Choose the most appropriate label for this image from 200 possible labels" to much more advanced "Reasoning" tests[0]. PaLI[1] is potentially the SoTA here but BeIT-3[2] may be better example for my thesis. Notice that BeIT-3 is trained on not just images, but also trained like an LLM. Yet it outperforms purely image-trained models on pure-image tasks like Object Detection and Semantic Segmentation.
More importantly, it can understand human questioning like "What type of flowers are in the blue buckets of this image?" and respond intelligently.
I think that we'll reach "good enough" - and that the commercial offerings won't have much tangible benefit for at least simply being "fancy autocomplete".
Currently you don't really use LLMs for designing the structure, just completing the implementation, and I think that will be very doable locally.
Local models can access anything on your filesystem without sending it over the network. Easy to imagine certain tasks that would have better performance.