It's complicated, but basically because most are llama architecture. Meta all but set the standard for open source llms when they released llama1, and anyone trying to deviate from it has run into trouble because the models don't work with the hyper optimized llama runtumes.
Also, there's a lot of magic going on behind the scenes with configs stored in gguf/huggingface format models, and the libraries that use them. There are different tokenizers, but they mostly follow the same standards.
Also, there's a lot of magic going on behind the scenes with configs stored in gguf/huggingface format models, and the libraries that use them. There are different tokenizers, but they mostly follow the same standards.