Efficient finetuning of Llama 3 with FSDP QDoRA

Yenrabbit · 2024-04-23T01:33:36 1713836016

This is exciting because:

- "Parameter Efficient" finetuning methods let you customize LLMs without having to train all the parameters

- But LoRA (the most popular method) didn't match full finetuning performance on some tasks

- DoRA closed the gap while still being very efficient

- Quantization (representing the original weights with fewer bits per parameter) makes things even more memory-efficient

- FSDP lets you spread the work over multiple GPUs, using less memory on each one.

The upshot is that where you previously needed, say, 8 fancy Nvidia A100s to fine-tune an LLM you can now do so on a few 3090s, and while it might take a little longer you're at least getting something almost as good as (or in some cases possible better than) the full finetuning equivalent.

jph00 · 2024-04-23T01:14:47 1713834887

Jeremy from Answer.AI here. Let me know if you have any questions or comments about this work. (Although I can't take any credit for it -- this is the work of Kerem Turgutlu!)

shutty · 2024-04-23T13:32:03 1713879123

It would be great to see this approach as a part of PEFT library, so you don’t need to rely on custom code