I don't think that's quite correct. The underlying algorithms start by processin...

I don't think that's quite correct. The underlying algorithms start by processing the prompt tokens in parallel, and once that's processed, the intermediate KV vectors are re-used for each new token prediction by the feed-forward.

What you may be thinking of is that once the model stops predicting at the end of an API call the state is dropped from RAM, and if you go back with another message in a chat, then the LLM will re-read the chat log up to that point as part of the prompt. So this is true on a per-message basis, but not on a per-word or per-token basis.