Hacker News new | past | comments | ask | show | jobs | submit login
BMList – A list of big pre-trained models (GPT-3, DALL-E2...) (github.com/openbmb)
55 points by fishingboy on July 30, 2022 | hide | past | favorite | 3 comments



I can think of many specialized applications where the versatility is superfluous while the size of the model prohibit inference on the edge.

Do you know if there is available methods for shrinking a fine-tuned derivative of such big models?

Beside generating a specialized corpora using the big model and then train a smaller model on it, is there a more direct way to reduce the matrices dimensions while optimizing for a more specific inference problem? How far can we scale down before the need of a different network topology?


You can quantize the model to 8-bit tensors instead of 16- or 32-bit bfloats. NVidia has dedicated hardware in their latest series of GPUs so that they can do inference with 8-bit quantization quickly, and it yields 1/2-1/4x of the model in memory. There are other tricks that can be used like sparse tensors, which have been applied to language models and can reduce the memory overhead 10-100x.

See also: "From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression"


As far as I am concerned, there are many ways to compress a model such as quantization, pruning, and knowledge distillation.

By the way, I found a package called BMCook when I browsed the OpenBMB repo, which implements several algorithms and also compares it with other model compression packages. Hope this can help you.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: