Hacker News new | past | comments | ask | show | jobs | submit login

> Any time you see one of those giant context window LLMs, you need to be asking what heuristics they added, what is getting correlated, and what is not getting correlated.

Exactly. The paper doesn't even contain any experiments with context windows over 32K tokens. Presumably because it doesn't really attend to the rest of those tokens at all. In practice it's just a 32K attention window with some "theoretical" opportunity for attending a bit further than that.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: