- "Parameter Efficient" finetuning methods let you customize LLMs without having to train all the parameters
- But LoRA (the most popular method) didn't match full finetuning performance on some tasks
- DoRA closed the gap while still being very efficient
- Quantization (representing the original weights with fewer bits per parameter) makes things even more memory-efficient
- FSDP lets you spread the work over multiple GPUs, using less memory on each one.
The upshot is that where you previously needed, say, 8 fancy Nvidia A100s to fine-tune an LLM you can now do so on a few 3090s, and while it might take a little longer you're at least getting something almost as good as (or in some cases possible better than) the full finetuning equivalent.
Jeremy from Answer.AI here. Let me know if you have any questions or comments about this work. (Although I can't take any credit for it -- this is the work of Kerem Turgutlu!)
- "Parameter Efficient" finetuning methods let you customize LLMs without having to train all the parameters
- But LoRA (the most popular method) didn't match full finetuning performance on some tasks
- DoRA closed the gap while still being very efficient
- Quantization (representing the original weights with fewer bits per parameter) makes things even more memory-efficient
- FSDP lets you spread the work over multiple GPUs, using less memory on each one.
The upshot is that where you previously needed, say, 8 fancy Nvidia A100s to fine-tune an LLM you can now do so on a few 3090s, and while it might take a little longer you're at least getting something almost as good as (or in some cases possible better than) the full finetuning equivalent.