If it's learning relationships between concepts at runtime based on information in the context window then it seems about as useful to say it is a Markov chain as it is to say that a human is a Markov chain. Perhaps we are, but the "current state" is unmeasurably complex.
Well all the information it learns at runtime is encoded in the context window.
I don't feel like {tokens}^ctxWindow is unmeasurably complex. I think one should see a transformer as a stochastic computer operating on its memory. If you modelled a computer as a stochastic process, would you taje the state space to consist of the most recent instruction, or instead the whole memory of the computer?
GPT-4 has a token window of 32K tokens. I don't think GPT-4's vocabulary size has been released but GPT-3 is 175K. I guess yes, the complexity is technically measurable but it does seem pretty large!