You see this exact problem in task based libraries like task based parallel libraries such as taskflow. It's easy to get burned using thread local variables if you are not careful because a task might yield to the scheduler which runs another task ( of the same type ) on the same thread and then your thread local is corrupted.
You end up requiring more sophisticated object pools where you can check out and check in objects.
What bonzini is referring to is actually a more subtle issue: the compiler will CSE the (hidden) thread_local address calculation even across function calls. So even if you are careful and do not assume that thread_local state is preserved across function calls, your code can still be wrong as suddenly it will be accessing a thread_local owned by another thread. That might be as simple as dereferencing the wrong errno.
It is very hard to workaround that in your code. The only practical solution is to never migrate coroutines to other threads.
You end up requiring more sophisticated object pools where you can check out and check in objects.