…but, when you’ve got a massive context windows like the GPT 35k, who cares?
If the quality of the response (eg. Only respond in markdown, it really does make a difference; you can see using the API) significantly improves the results, it’s probably worth it.
It’s only really an issue for smaller models like llama with much smaller context windows.
> …but, when you’ve got a massive context windows like the GPT 35k, who cares?
AIUI, prompt size still impacts the inference cost (the compute resources, even if you are the first party so you aren’t paying retail API pricing), and while the “you won’t have room left for work in your context window” problem is not as bad with the bigger long-window models, the inference cost per token is higher for those models, so one way or another its a factor.
…but, when you’ve got a massive context windows like the GPT 35k, who cares?
If the quality of the response (eg. Only respond in markdown, it really does make a difference; you can see using the API) significantly improves the results, it’s probably worth it.
It’s only really an issue for smaller models like llama with much smaller context windows.