I'm curious, what does your human verification process look like? Does it involve a separate interface or a generated report of some kind? I'm currently working on an tool for personal use, that records actions and triggers them at later stage on when specified event occurs. For verification, generating a CSV report after the process is complete and backing it up with screen recordings.
It's a separate interface where the output of the LLM is rated for safety, and anything unsafe opens a ticket to be acted upon by the medical professionals.