Nope. LLMs have a limited context window partly because that's the chunk size wi...

Nope. LLMs have a limited context window partly because that's the chunk size with which they're presented with data to learn during training (and partly for computational complexity reasons).

One of the reasons I'm feeling very bullish on LLMs is because if you look at the exact training process being used it's full of what feels like very obvious low hanging fruit. I suspect a part of the reason that training them is so expensive is that we do it in really dumb ways that would sound like a dystopian hell if you described it to any actual teacher. The fact that we can get such good results from such a terrible training procedure by just blasting through it with computational brute force, strongly suggests that much better results should be possible once some of that low hanging fruit starts being harvested.