LoRA+: Efficient Low Rank Adaptation of Large Models

batterseapower · 2024-04-28T14:42:44 1714315364

The other recent improvement suggested for LoRA is DoRA: https://magazine.sebastianraschka.com/p/lora-and-dora-from-s.... It really does seem to strongly outperform LoRA - see also https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.htm...

josalhor · 2024-04-28T15:55:03 1714319703

I just skimmed over LoRA+ and DoRA and I see no reason why these improvements could not go hand in hand. Actually, LoRA+ seems to be about efficient training while DoRA seems about improving the ability to actually learn, making it significantly more robust. Although I still have my questions on how the improvements of LoRA+ would be applied to the magnitude vector.

WithinReason · 2024-04-28T15:50:19 1714319419

The two methods seem to be independent, wonder if you can combine them for even better performance.

Interestingly both seem to indirectly modify the optimisation process, in my opinion effectively trying to fix a bad optimiser. Seems like we still have a long way to go after Adam...

neodypsis · 2024-04-28T19:02:19 1714330939

> Seems like we still have a long way to go after Adam...

A preprint in arxiv suggests that Adam works better than SGD for training LLMs due to the issue of class-imbalance [0]. It appears that scaling the gradient step helps with the training, for example, see another approach suggested in [1].

0. https://arxiv.org/pdf/2402.19449 1. https://arxiv.org/pdf/2402.02347

Ger_Onimo · 2024-04-28T16:17:51 1714321071

I've just started playing with DoRAs for fine-tuning TTS models towards particular styles of speech, and they're working extremely well!

allpaca · 2024-04-28T17:44:34 1714326274

Can you tell us more about it? Have you reported the results of your experiments in a post?

mysfi · 2024-04-28T19:28:32 1714332512

Count me interested here as well, specially if it is about the style of speech. I had a fun project in mind that involved the style of speech.

cooljoseph · 2024-04-28T22:37:42 1714343862

Those blog posts are pretty bad. Just read the original paper, https://arxiv.org/pdf/2402.09353. The key section is 4.1.

cuuupid · 2024-04-28T14:40:20 1714315220

I’m struggling to understand from this paper whether the approach is better in the general sense (all cases, with wider models seeing greater benefits) or purely for wider models (with narrower models seeing detriment)?

If it’s the former this could effectively halve finetuning cost overnight which would go a significant way towards enabling a wider array of use cases for LoRA.

ironbound · 2024-04-28T16:58:32 1714323512

I've had sucess with GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection https://arxiv.org/abs/2403.03507

Scipio_Afri · 2024-04-29T05:40:19 1714369219

This uses less memory so you can do fine tuning or hardware with less vram but at a cost of taking longer on training - there is a throughput penalty, the paper detailing the technique shows something like a 15% decrease in throughput.

youssefabdelm · 2024-04-28T16:25:18 1714321518

A better name would've probably been FastLoRA or something

mobilemidget · 2024-04-28T20:21:49 1714335709

I was expecting to read about LOng RAng radio communication.

https://en.wikipedia.org/wiki/LoRa

throwaway2562 · 2024-04-28T16:37:27 1714322247

fLORA

yau8edq12i · 2024-04-28T14:35:48 1714314948

What an unfortunate name... I initially thought this was about wireless communication. https://en.wikipedia.org/wiki/LoRa

sorenjan · 2024-04-28T15:02:09 1714316529

This gets mentioned here everytime an article about LoRA is posted. Sometimes acronyms means multiple things, they're not in the same field so the risk of confusion beyond short headlines is negligible.

It's a bit like if someone reading a bicycling article and getting annoyed that FTP means Functional Threshold Power instead of File Transfer Protocol, or reading about machine learning and getting confused that MLP doesn't mean My Little Pony.

rakoo · 2024-04-28T15:06:33 1714316793

"computer science" and "bicycles" aren't the same domain, it's fine to have the same acronym.

"computer science" and "tv shows" aren't the same domain, it's fine to have the same acronym.

"computer science" and "computer science" are the same domain, it's not a good idea to use the same acronym.

dragonwriter · 2024-04-28T15:50:03 1714319403

> "computer science" and "computer science" are the same domain, it's not a good idea to use the same acronym.

But “radio communication" is not “computer science”, even though people sometimes plug radio transceivers into computers, just like “tv shows” aren't “computer science” just because people sometimes view or store their shows on a computer, and “bicycles” aren’t “computer science” because sometimes people mount computers on their bikes.

WithinReason · 2024-04-28T15:09:55 1714316995

Large Models is in the title so it's obviously not about radio

GuB-42 · 2024-04-28T15:18:42 1714317522

The acronym is also spelled out in the title: LoRA = Low Rank Adaptation.

rakoo · 2024-04-28T19:15:19 1714331719

Large models is not spelled in full, and doesn't explicitely says it's not about the communication protocol.

squigz · 2024-04-28T15:31:17 1714318277

Context is hard!

rakoo · 2024-04-28T19:16:52 1714331812

So instead of LoRa and anything else, everyone now has to say LoRa (the communication protocol) or LoRa (the large model thing). Having to add context all the time makes everything so much simpler !

squigz · 2024-04-28T20:47:50 1714337270

Or potentially include the necessary context in the title of the post.

rakoo · 2024-04-29T21:08:20 1714424900

Or just pick another name

squigz · 2024-04-30T03:45:34 1714448734

That does seem to be more reasonable than expecting people to pick up on context

WithinReason · 2024-04-28T19:39:01 1714333141

Low rank adaptation is abbreviated LoRA

nostrademons · 2024-04-28T16:03:56 1714320236

"Computer science" isn't really one domain anymore - the field split into several subdomains in the 2010s. Just try to get a job as a "computer scientist" now - the recruiter would be like "No, are you a web developer? Mobile developer? Backend developer? Data scientist? Data engineer? Cloud engineer? AI engineer? Machine learning developer?"

archgoon · 2024-04-28T15:45:52 1714319152

LoRa is radio communication so part of EE, not CS.

Otherwise, I'm going to toss TV shows as part of CS, since Amazon Prime and Netflix streaming are things. Also, it's the original programming.

the__alchemist · 2024-04-28T15:54:51 1714319691

I think the reason this keeps coming up is encoded in your second sentence, in conjunction with the HN medium: LoRa and LoRA are both, unfortunately, things that the target audience are likely to be interested in and/or knowledgeable with, but a general audience is not.

Also, both use a non-standard case mix.

1024core · 2024-04-28T16:11:22 1714320682

> Sometimes acronyms means multiple things

Exactly. Like WiFi: from ancient times it has meant "Wife's Fidelity".

teaearlgraycold · 2024-04-28T16:17:29 1714321049

Tell my WiFi love her

EGreg · 2024-04-28T16:22:12 1714321332

Finally! Some people have been screaming to change the acronym since 2001. But these tech bros group didn’t listen. Such hubris!

https://en.m.wikipedia.org/wiki/Crypto_naming_controversy

yau8edq12i · 2024-04-28T17:59:59 1714327199

> This gets mentioned here everytime an article about LoRA is posted.

I wonder why!

IshKebab · 2024-04-28T16:28:52 1714321732

Yes but radio protocols and AI methods are a lot closer than most overlapping acronyms. This is obvious from the fact that it gets mentioned every time an article about LoRA is posted.

mattlondon · 2024-04-28T16:38:08 1714322288

But these are clearly both in the same field as everyone keeps saying mentioning it here! So clearly there is confusion. It certainly tricked me on first reading - "ah cool - efficient lora+ that sounds cool... Ah wait no it's just some machine learning spam"

kcorbitt · 2024-04-28T15:21:02 1714317662

This specific variant "LoRA+" described in this paper is even harder to search for. I was doing some research on this technique recently and it turns out that "Lora+" matches with "Lora" in Discord search, which is quite unhelpful. :)

SquareWheel · 2024-04-28T15:54:12 1714319652

Discord search is one of the worst I've ever used. They remap words like "localization" to "local", which makes it impossible to search for more specific terms.

rytill · 2024-04-28T15:12:04 1714317124

That’s LoRa. This is LoRA.

bee_rider · 2024-04-28T14:37:47 1714315067

The idea of low rank approximations is not new, truncated SVDs for example have been used for many decades.

yau8edq12i · 2024-04-28T14:40:27 1714315227

The acronym LoRA used in the context of deep learning (2021) is about seven years younger than the radio communication protocol called LoRa (2014). Type "lora" in a search engine and see what you get.

kleiba · 2024-04-28T15:00:40 1714316440

In the first 10 search results, I now get a mix of results for either of the two technologies when searching with Google.

blamestross · 2024-04-28T14:48:26 1714315706

Yeah I would prefer they didn't offusicate the actually useful search term

allpaca · 2024-04-28T17:42:27 1714326147

This is old, having been released in February... Why do you talk about it now?

axpy906 · 2024-04-28T15:28:57 1714318137

In 2024 are folks still swapping out LoRA adapters? Is this still relevant?

ac2u · 2024-04-28T15:34:26 1714318466

Can’t tell if your tone is inquisitive or incredulous :)

If the later please point out the alternatives.

bckr · 2024-04-28T15:33:16 1714318396

Why would it not be?