Hacker News new | past | comments | ask | show | jobs | submit login

Are they just fine-tuning part of the model on the "unsupervised" portion of the training data? I think that's not entirely unfair because it might be realistic. If you have a big corpus of data and a pre-existing model, you might want to fine tune the latter using the former. However it's certainly a generous benchmark and doesn't reflect real-world "online" usage.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: