It's clear that QLoRA has opened up finetuning to a wider audience with limited ...

It's clear that QLoRA has opened up finetuning to a wider audience with limited compute, which is a good thing.

One thing I've wondered about: what are the drawbacks to using QLoRA? For example if compute is not a limit, I'm guessing one should not use QLoRA and finetune in full precision instead?

Afaik when a model is first quantized to nf4 (before finetuning begins), model performance is degraded from baseline (see https://x.com/Tim_Dettmers/status/1661482614811918338?s=20).

Dettmers shows that after finetuning wrt the dataset, the result is as good as full precision. But afaik never explored the effects outside the finetuning data. Assuming the finetuning dataset is small, the model will just be the degraded nf4 version, right? Or perhaps finetuning will even skew the model in weird ways (trying to fix quantization errors).

Anecdotally models finetuned wth QLoRA perform well. Does anyone have any papers or a careful analysis of this?