Hacker News new | past | comments | ask | show | jobs | submit login

It's not. It can do in context learning, which Markov chains cannot do.



It is a Markov Chain on the state space {Tokens}^CtxWindow.


I don't think that's clear at all.

https://arxiv.org/abs/2212.10559 shows a LLM is doing gradient descent on the context window at inference time.

If it's learning relationships between concepts at runtime based on information in the context window then it seems about as useful to say it is a Markov chain as it is to say that a human is a Markov chain. Perhaps we are, but the "current state" is unmeasurably complex.


Well all the information it learns at runtime is encoded in the context window. I don't feel like {tokens}^ctxWindow is unmeasurably complex. I think one should see a transformer as a stochastic computer operating on its memory. If you modelled a computer as a stochastic process, would you taje the state space to consist of the most recent instruction, or instead the whole memory of the computer?


GPT-4 has a token window of 32K tokens. I don't think GPT-4's vocabulary size has been released but GPT-3 is 175K. I guess yes, the complexity is technically measurable but it does seem pretty large!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: