Idea of transfer learning of deep representations for NLP tasks was before, but nobody was able to achieve it before ELMo.
If we are pedantic we can include the whole word2vec stuff. It's a shallow transfer learning.
Idea of transfer learning of deep representations for NLP tasks was before, but nobody was able to achieve it before ELMo.
If we are pedantic we can include the whole word2vec stuff. It's a shallow transfer learning.