You can model it as a state machine, where the LLM decides to what state it wants to advance. In terms of developer ergonomics, strongly typed outputs help. You can for example force a function call at each step, where one of the call arguments is an enum specifying the state to advance to.
Shoot me an email if you want to discuss specifics!
Are you using a separate state manager + function calling so the LLM knows where it is?