Could you point me to evidence what you just wrote is true?
I just asked mistral 7b to provide 5 sentences that end with Apple. It couldn't do it. I then provided 5 examples generated from ChatGPT 4 and asked it to generate 5 more. It still couldn't do it. 10 ChatGPT examples- still couldn't do it.
You seem to be saying the models can generalize on the entire context size, that I should keep provided examples up to the token limit because this will make the model smarter. Is there evidence of that?