Hacker News new | past | comments | ask | show | jobs | submit login

Yes, see DeepMind RETRO:

> In our experiments on the Pile, a standard language modeling benchmark, a 7.5 billion parameter RETRO model outperforms the 175 billion parameter Jurassic-1 on 10 out of 16 datasets and outperforms the 280B Gopher on 9 out of 16 datasets.

https://www.deepmind.com/blog/improving-language-models-by-r...

Though, there hasn't been much follow-up research on it (or DeepMind is not publishing it).

Annotated paper: https://github.com/labmlai/annotated_deep_learning_paper_imp...




The research is still ongoing, although perhaps lower-profile than what appears in the press.

RETRO did get press, but it was not the first retrieval model, and in fact was not SOTA when it got published; FiD was, which later evolved into Atlas[0], published a few months ago.

[0]: https://github.com/facebookresearch/atlas




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: