Hacker News new | past | comments | ask | show | jobs | submit login

LLMs are Marokov Chains in the following sense: States are vectors of context-length many tokens. Then the model describes a transitions matrix: For a given context-length sized vector of tokens it gives you probabilities for the next context-length sized vector of tokens.



Could you elaborate what context length means in this context? Maybe an example?


The length of the input in tokens. For the simple case of tokens just being characters, a LLM does nothing but take a string of length n, the context length, and calculate for each character in the alphabet the probability that this character is the next character following the input. Then it picks one character at random according to that distribution, outputs it as the first character of the response, appends it to the input, discards the first character of the input to get it back to length n and then repeats the entire process to produce the next character of the response.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: