Are they just fine-tuning part of the model on the "unsupervised" portion of the...

Are they just fine-tuning part of the model on the "unsupervised" portion of the training data? I think that's not entirely unfair because it might be realistic. If you have a big corpus of data and a pre-existing model, you might want to fine tune the latter using the former. However it's certainly a generous benchmark and doesn't reflect real-world "online" usage.