Great news. No URLs for the models yet and nothing on https://huggingface.co/met...

Great news. No URLs for the models yet and nothing on https://huggingface.co/meta-llama . But the appendix has a really cool explanation of the RoPE algorithm and a visualization of what it means to rotate the angle of the token embeddings set of values. It really gives you an intuitive understanding of why the cosine similarity varies periodically so that every $x tokens are more like each other. With the increased base frequency in this version of RoPE that just means more spurrious "similiarity' between tokens ever $x tokens along in positional encoding. I've never seen anyone exploit this to achieve a desired result yet but it seems straightforwards: pre-tokenize your input and then massage it so the tokens you want most linked $x tokens apart.