That's just the default. You can set max_seq_len to 8k. From the readme [0]:
> All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.
[1] https://github.com/meta-llama/llama3/blob/14aab0428d3ec3a959...
[2] https://github.com/meta-llama/llama3/blob/14aab0428d3ec3a959...