Something that stood out to me skimming the paper - that was somewhat buried - t...

nerdponx · on Oct 16, 2023

Are they just fine-tuning part of the model on the "unsupervised" portion of the training data? I think that's not entirely unfair because it might be realistic. If you have a big corpus of data and a pre-existing model, you might want to fine tune the latter using the former. However it's certainly a generous benchmark and doesn't reflect real-world "online" usage.

atorodius · on Oct 16, 2023

That's normal for ML

buildbot · on Oct 16, 2023

To finetune on each benchmark? I'd say it's not in our modern era of in-context learning, though of course fine-tuning has it's place as well for making smaller models better in one domain than a generalist larger model.