One thing I've been using is a spinlock where the RT thread only uses try_lock()...

One thing I've been using is a spinlock where the RT thread only uses try_lock() + fallback path and the NRT thread(s) call try_lock() in a loop with pause instructions and occasional thread yielding (or even sleeping). This might waste lots of CPU cycles when a NRT thread tries to acquire a lock that is already held by the RT thread, but assuming this only happens occasionally/rarely, it's a reasonable trade-off.