I think it's mostly an organic process arising from the ecosystem.
My personal way of understanding it is this - the original sin of model weight format complexity is that NNs are both data and computation.
Representing the computation as data is the hard part and that's where the simplicity falls apart. Do you embed the compute graph? If so, what do you do about different frameworks supporting overlapping but distinct operations. Do you need the artifact to make training reproducible? Well that's an even more complex computation that you have to serialize as data. And so on..
My personal way of understanding it is this - the original sin of model weight format complexity is that NNs are both data and computation.
Representing the computation as data is the hard part and that's where the simplicity falls apart. Do you embed the compute graph? If so, what do you do about different frameworks supporting overlapping but distinct operations. Do you need the artifact to make training reproducible? Well that's an even more complex computation that you have to serialize as data. And so on..