Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, but in case of ELMo it was fine-tuning (training of pretrained language model and task model together), not just transfer learning.



With ELMo, the pretrained weights are frozen. Only the scalars for ELMo layers are tuned (as well as the additional top-level model, of course).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: