Glorified autocomplete? Autocomplete can guess the next word .. sometimes, GPT-3 goes hundreds of words ahead. On generic topics it can be hard to distinguish from human text.
And it can't cache tokens because all tokens are evaluated in the context of all the other tokens, so they don't have the same representations when they reoccur at different positions.
They're evaluated in the context of the last 2^n many tokens, for many models it is 1024, 2048, or 4096 tokens as a scanning window. The tokens (words and sometimes punctuation) are represented by integer values, so the last 2^n many tokens would certainly qualify for storage in a cache. Then next token selection only has so many possible assignable selections in any given language model because of grammatical limitations. This is only one such optimization, there could also be optimizations around the likelihood of certain words to be used given the presence of certain previous tokens, and so on.
But, yes, tokens are chosen one word as a time based on the previous content, similar to earlier auto-completion algorithms.
And it can't cache tokens because all tokens are evaluated in the context of all the other tokens, so they don't have the same representations when they reoccur at different positions.