To elucidate on this: the LLM can be viewed as a function that takes the context...

To elucidate on this: the LLM can be viewed as a function that takes the context as an attachment and produces a probability distribution for the next token over all known tokens.

Most inference samples from that distribution using a composition of sampling rules or such, but there's nothing stopping you from just always taking the most probable token (temperature = 0) and being fully deterministic. The results are quite bland, but it's perfect for extraction tasks.

(note: GPT-4 is not fully deterministic; there's no details on this but the running theory is it is a mixture of experts model and that their expert routing algorithm is not deterministic/is dependent on the resources available)