As far as I am concerned, there are many ways to compress a model such as quantization, pruning, and knowledge distillation.
By the way, I found a package called BMCook when I browsed the OpenBMB repo, which implements several algorithms and also compares it with other model compression packages. Hope this can help you.
By the way, I found a package called BMCook when I browsed the OpenBMB repo, which implements several algorithms and also compares it with other model compression packages. Hope this can help you.