…Because it has been trained – partly by manual human effort – to *specifically*...

…Because it has been trained – partly by manual human effort – to specifically predict tokens that comprise a meaningful dialogue, or a Q&A session, or whatever, such that certain types of prefix token sequences such as "you shall not discuss life, the universe, and everything" heavily deweight parts of its high-dimensional concept space related to those concepts.

A dialogue is just a sequence of tokens with a specific structure that the network can learn and predict, just like it can learn and predict a sequence of valid board states in Go, or whatever. There’s really not much more to it.