LLMs are Marokov Chains in the following sense: States are vectors of context-le...

rrr_oh_man · on Feb 5, 2024

Could you elaborate what context length means in this context? Maybe an example?

danbruc · on Feb 5, 2024

The length of the input in tokens. For the simple case of tokens just being characters, a LLM does nothing but take a string of length n, the context length, and calculate for each character in the alphabet the probability that this character is the next character following the input. Then it picks one character at random according to that distribution, outputs it as the first character of the response, appends it to the input, discards the first character of the input to get it back to length n and then repeats the entire process to produce the next character of the response.