Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
snippyhollow
on Aug 24, 2023
|
parent
|
context
|
favorite
| on:
Code Llama, a state-of-the-art large language mode...
We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long sequences.
malwrar
on Aug 24, 2023
[–]
Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?
[1]:
https://arxiv.org/abs/2108.12409
(charts on page two if you’re skimming)
ttul
on Aug 24, 2023
|
parent
[–]
Undoubtedly, they have tried ALiBi…
Join us for
AI Startup School
this June 16-17 in San Francisco!
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: