Wouldn’t that race against any other thread in the process? I guess you could stop all threads when you hit the breakpoint and start them again after you restore the breakpoint, but the synchronisation of that would be really tricky too.
You could also do something like have a clean mapping table (i.e. the code with no breakpoints installed) that you install for just the thread doing the step. You then revert back to the normal mapping table with the breakpoint after the step. As you are only modifying the executable section, as long as you are not using self-modifying code, there should be no data inconsistency with having a multiple copys of the executable transiently.
1. Overwrite instruction with int 3.
2. When you hit the breakpoint, restore the original instruction.
3. Single-step over the original instruction by changing the thread's EFlags (Intel).
4. Restore the breakpoint with int 3.
5. Resume normally.