Not necessarily, you don't need to make a domain-specific model from a general model, you can definitely make a large pretrained domain-specific model from scratch by training it only on domain-specific data, which can result in a smaller and more efficient model.
Furthermore, when making task-specific models, an 'encoder' architecture (similar to BERT) often works better than a 'decoder' architecture (similar to GPTx), so you might want to use a similar-but-different architecture than the general model intended to be conversational/generative.
If you want to build a domain-specific classifier that determines whether an image is a dog or a cat, and you have 50 labeled images of dogs and cats, it's much better to start with a large model pretrained on millions of images, and then specialize it by training on 50 images of dogs and cats.
Try to start with a NN and 50 images of dogs and cats, and it won't work very well.
Sure, that's correct, but that's absolutely unrelated to what we were talking about; your example is about the general concept of transfer learning to task-specific annotated data, not about domain-specific pretrained models.
For example, if you want a domain-specific model for the legal domain, then you can pre-train a large self-supervised model on every single legal-related document in the world you can get your hands on, instead of a general mix of news and fiction and blogs and everything else - and that might be a more efficient starting point for however many(few) annotated examples you have for your task-specific classifier than the general model.
Legal-related documents are a minuscule fraction of the corpus the large model is trained on. The resulting model won't have the conceptual fluency that the large model has. It's like training a human baby with legal briefs and expecting her to be a good lawyer.
The large pretrained model is a prereq for domain-specific models.