We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long seque... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

snippyhollow on Aug 24, 2023 | parent | context | favorite | on: Code Llama, a state-of-the-art large language mode...

We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long sequences.

malwrar on Aug 24, 2023 [–]

Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?

[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming)

ttul on Aug 24, 2023 | [–]

Undoubtedly, they have tried ALiBi…

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact