Isn't the model attempting to conserve information during training? And isn't information a physical quantity?
Isn't the model attempting to conserve information during training? And isn't information a physical quantity?