Yeah, but in case of ELMo it was fine-tuning (training of pretrained language mo... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

riku_iki on Feb 17, 2019 | parent | context | favorite | on: Microsoft’s New MT-DNN Outperforms Google BERT

Yeah, but in case of ELMo it was fine-tuning (training of pretrained language model and task model together), not just transfer learning.

phowon on Feb 17, 2019 [–]

With ELMo, the pretrained weights are frozen. Only the scalars for ELMo layers are tuned (as well as the additional top-level model, of course).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact