Hacker News new | past | comments | ask | show | jobs | submit login

The title of that podcast isn't something I actually said (IIRC). I commented in that interview that I feel we should not consider pre-training and fine-tuning to be as separate as we do now.



So you‘re generally in favor of mixing training data without separating them in phases, but when I use pretrained weights (as you recommend instead of random weights) I generally do not have access to whatever the neural net was pretrained with by someone else, so I have to make do with my finetuning data, yes?

Thank you!


Yes.


"The right way to fine-tune language models... is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training."

:) i hope i didnt pervert your intent too too much for clickbait or something, i thought it was the spirit of what you said




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: