The title of that podcast isn't something I actually said (IIRC). I commented in...

Tomte · 2024-03-08T10:40:11

So you‘re generally in favor of mixing training data without separating them in phases, but when I use pretrained weights (as you recommend instead of random weights) I generally do not have access to whatever the neural net was pretrained with by someone else, so I have to make do with my finetuning data, yes?

Thank you!

pama · 2024-03-08T22:42:01

swyx · 2024-03-09T02:23:35

"The right way to fine-tune language models... is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training."

:) i hope i didnt pervert your intent too too much for clickbait or something, i thought it was the spirit of what you said