I was wondering how something so basic could go unnoticed for so long. Halfway down the page on OP's link, a user u/rbmm provides a compelling answer: that there are (possibly expected?) cases where a thread trying to acquire the lock in shared mode can accidentally get it in exclusive mode instead. This is due to interleaving of atomic bit test-and-[re]set operations between the (shared mode acquire) thread and the (exclusive mode release) thread running simultaneously.
The repro code holds the shared lock while waiting for all other threads to acquire the shared lock, and thus deadlocks if any of the worker threads accidentally gets an exclusive lock. In "normal" use cases where the lock is used to protect some shared resource, threads holding the lock don't wait for each other, so there is no deadlock.
https://github.com/rbmm/SRW-2 - possible better repro code . where i guarantee repro this without hundreds loops. i be say that "question" in RtlReleaseSRWLockExclusive implementation. it first remove Lock bit from SRW (and after this any thread can enter to lock) and then call RtlpWakeSRWLock. but this 2 operations not atomic. if in the middle (after Lock Bit removed and before RtlpWakeSRWLock executed) another thread acquire shared access - it got exclusive. and RtlpWakeSRWLock also "fail" after this - not wake any waiters. but when shared/exlusive owner then release lock..RtlpWakeSRWLock will be called again and self job. i be of course use bit another implementation, link to which i paste in another comment here
The lock in unfair. We are unfair because you get better performance by not allowing the lock hold time to be extended by context swap. As a result, we always unconditionally release a lock when an exclusive acquire is release or the last shared guy releases. Then we go find people to wake. In this gap the lock can be stollen. The threads that steal all look like exclusive guys.
My overall rule of thumb is that in the presence of any exclusive acquires you can never assume a shared acquire is compatible with any other shared acquire. A second shared acquire might wait for the first acquire to exit. We do this for example, so a stream of readers don't starve a writer. This is another case like that but somewhat less obvious.
I have no idea if the owners will attempt to fix or take the view that you're trying to assume something we don't guarantee. I'll admit that this is a kind of strange case. You could obviously queue a wait block and the in progress waker will sort this out. Could well get a performance impact from that.
I tried to explain this general rule of thumb to you this morning when you contacted me. I could not understand much of what you said.
> My overall rule of thumb is that in the presence of any exclusive acquires you can never assume a shared acquire is compatible with any other shared acquire.
Contract #1: In reader writer locks in general, after I share-acquire a lock, I know that there are no active exclusive owners of that lock and won't be until I release my shared lock. I can also expect that as long as I hold this shared lock, other threads can share-acquire the same lock without waiting. ony share-exclusive threads would have to wait.
This general contract seems useful to me.
The contract you're describing, contract #2, is one in which shared-acquire is an optional optimization over exclusive-acquire (not a contractual guarantee) and that the system is free to promote shared to exclusive lock acquisitions.
This other contract seems finnicky and error-prone.
It's typical not to allow a second share acquire to proceed if we have an exclusive waiter:
t1: share
t2: exclusive so waits
t3: share also waits
Thats how we arrive at the rule that shared acquires in anything, but a trivial system (no exclusive acquires) may not be compatible.
So, contract #1 is typically not satisfied. Of course, this particular case is slightly different and so you might decide to support it.
Blocking new readers when a writer arrives is perfectly good and desirable. Blocking readers when the writer finishes, and there isn't any new queued, is definitely not.
I meant that contract #1 is that when I, thread A, already have a lock held in shared mode, I can expect shared-acquires on thread B to succeed without my first giving up my shared lock. The problem that we're discussing here is that I, A, can ask for a shared lock but actually and unknowingly get an exclusive lock. There's no exclusive-acquire involved.
"other threads can share-acquire the same lock without waiting."
can but not always and mandatory. you really allow to system enter another shared requestor. but only allow,not demand. system not let shared requestor enter to lock, if before it the exclusive request will be. after first exclusive request - all next shared request will block, util this exclusive request not acquire and then release lock. can be and some another reason, in the end - this is only optimization - you only *allow* multiple shared enter at once. so i be not say that this is implementation bug
yes. after any thread request exclusive access to lock, the sequential shared acquire request will block. this iswell known fact, i hope. this is example when shared request will be blocked, despite shared owner(s) now inside lock
if say true - i here in very strange position. i am by self under debugger research exacly what happens in concrete case and create repro code. i think that implementation of the RtlReleaseSRWLockExclusive is not the best and may be really containing "bug" as i in more details describe in comment on reddit. but from another side, if you want pure formal c++ rules - can you explain - in what exactly was bug in concre case ? what rule/guarantee is violated ? why demo code decide that ALL shared waiters can at once acquire the lock. formal documentation not state this. this intuitive must be true, because no exclusive request more. but.. i really dont know. and i also for fun create own implementation of SRW lock ( if someone interested to look -https://github.com/rbmm/PushLock ) which free from this roblem - i always ( hope) do only single atomic change to lock state during api call. and finally - sorry for my english and too long answer
Why does it have to use bit test and set and interleave with other threads, though. AIUI you can use a CAS loop to implement any RMW atomically over word-sized (or double word sized, on many platforms) data. That seems like a no-brainer.
For comparison, the Rust implementation for lightweight RWLocks on futex-capable *nix platforms is here: https://doc.rust-lang.org/stable/src/std/sys/unix/locks/fute... It sets the "reader counter" in the underlying atomic to a special value to signal that the lock is set for exclusive access. So a reader thread acquiring the lock as shared can never result in this kind of bug. Bits are used to signal whether readers or writers are currently waiting on a lock, but this just cannot turn a lock that's acquired for shared access into exclusive, or vice versa.
We use bit test and set as it causes us to get the cache line exclusive. This avoids using the prefetch write before fetching if we were to use CompareExchange. somebody was trying to talk to me about this stuff this morning but I didn't understand him. You can't expect a reader to always be compatible with other readers otherwise you livelock with a constant stream of readers. So we become incompatible. I am unsure if there is something beyond this.
Interesting comment for sure. Of course if this is intended behaviour it should at least be properly documented, since other implementations don't seem to do this random "upgrading" of a shared to an exclusive lock and it does create an issue whenever readers might be waiting on one another while holding the lock as shown in OP's code.
If that "rbmm" is the same person as I've seen on other sites, and the characteristic non-native English is a clue that it is, he certainly knows his stuff.
threads holding the lock don't wait for each other
what is shared mode ? this is by fact optimization for speed, if we need read-only access to data, we allow to system let another thread into the section that requests shared access also
allow but NOT DEMAND this. If one thread has acquired the shared lock , other thread can acquire the shared lock too. but only CAN. in some case system not let another thread enter to lock, despite it also request share access. one case: if another rthread request exclusive acess - he begin wait and after this - any thread which acquire even shared access to lock - also begin wait
If lock_shared is called by a thread that already owns the mutex in any mode (exclusive or shared), the behavior is undefined.
and
Shared mode SRW locks should not be acquired recursively as this can lead to deadlocks when combined with exclusive acquisition.
why is this ? because if between 2 calls to lock_shared ( AcquireSRWLockShared ) another thread call AcquireSRWLockExclusive - the second call is block.
the code in example make assumption that ALL threads can enter to lock at once. that if one thread enter to lock in shared mode, another thread also ALWAYS can enter to lock in shared mode (if no exclusive requests). but i not view clear formalization of such requirement. and we must not based on this.
i be will add next rule:
thread inside lock must not wait on another thread to enter this lock
this is obvivius for exlusive access, but not obvivous to shared. but must be cleare stated along with the recursive rule ( should not be acquired recursively as this can lead to deadlocks, even in shared mode)
The read threads block because the ReadWriteLock algorithm tries to prevent thread starvation (i.e. when the exclusive lock never gets acquired). Most ReadWriteLock implementation alternate between giving the read locks then the exclusive locks access to the lock.
of course - SRW lock allow shared access only if no waiters ( request to exclusive access) on lock. so even if lock in shared mode, new shared request can block, if was waiter(exclusive) already. in case OP no exactly this, but anyway - i be say that shared access is only hint to system, that it can optimize access - and allow multiple shared threads inside lock. but this was not always
The repro code holds the shared lock while waiting for all other threads to acquire the shared lock, and thus deadlocks if any of the worker threads accidentally gets an exclusive lock. In "normal" use cases where the lock is used to protect some shared resource, threads holding the lock don't wait for each other, so there is no deadlock.
Interesting stuff!