Why is max_seq_len set to 2048 [1] when the model card says the context size is ...

mkolodny · 2024-05-17T00:48:34 1715906914

That's just the default. You can set max_seq_len to 8k. From the readme [0]:

> All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.

[0] https://github.com/meta-llama/llama3/tree/14aab0428d3ec3a959...