In designing software, there's often a trade off between (i) generality / configurability, and (ii) performance.
llama.cpp is built for inference, not for training or model architecture research. It seems reasonable to optimize for performance, which is what ~100% of llama.cpp users care about.
GGUF files seems to be proliferating. I think some folks (like myself) make an incorrect assumption that the format has more portability/generalizability than it appears to have. Hence, the horror!
In designing software, there's often a trade off between (i) generality / configurability, and (ii) performance.
llama.cpp is built for inference, not for training or model architecture research. It seems reasonable to optimize for performance, which is what ~100% of llama.cpp users care about.