Bug in reader/writer locks in Windows API

davekilian · 2024-03-03T19:13:18 1709493198

I was wondering how something so basic could go unnoticed for so long. Halfway down the page on OP's link, a user u/rbmm provides a compelling answer: that there are (possibly expected?) cases where a thread trying to acquire the lock in shared mode can accidentally get it in exclusive mode instead. This is due to interleaving of atomic bit test-and-[re]set operations between the (shared mode acquire) thread and the (exclusive mode release) thread running simultaneously.

The repro code holds the shared lock while waiting for all other threads to acquire the shared lock, and thus deadlocks if any of the worker threads accidentally gets an exclusive lock. In "normal" use cases where the lock is used to protect some shared resource, threads holding the lock don't wait for each other, so there is no deadlock.

Interesting stuff!

rbmm · 2024-03-04T01:50:49 1709517049

https://github.com/rbmm/SRW-2 - possible better repro code . where i guarantee repro this without hundreds loops. i be say that "question" in RtlReleaseSRWLockExclusive implementation. it first remove Lock bit from SRW (and after this any thread can enter to lock) and then call RtlpWakeSRWLock. but this 2 operations not atomic. if in the middle (after Lock Bit removed and before RtlpWakeSRWLock executed) another thread acquire shared access - it got exclusive. and RtlpWakeSRWLock also "fail" after this - not wake any waiters. but when shared/exlusive owner then release lock..RtlpWakeSRWLock will be called again and self job. i be of course use bit another implementation, link to which i paste in another comment here

NeillC · 2024-03-04T02:47:31 1709520451

The lock in unfair. We are unfair because you get better performance by not allowing the lock hold time to be extended by context swap. As a result, we always unconditionally release a lock when an exclusive acquire is release or the last shared guy releases. Then we go find people to wake. In this gap the lock can be stollen. The threads that steal all look like exclusive guys. My overall rule of thumb is that in the presence of any exclusive acquires you can never assume a shared acquire is compatible with any other shared acquire. A second shared acquire might wait for the first acquire to exit. We do this for example, so a stream of readers don't starve a writer. This is another case like that but somewhat less obvious. I have no idea if the owners will attempt to fix or take the view that you're trying to assume something we don't guarantee. I'll admit that this is a kind of strange case. You could obviously queue a wait block and the in progress waker will sort this out. Could well get a performance impact from that. I tried to explain this general rule of thumb to you this morning when you contacted me. I could not understand much of what you said.

quotemstr · 2024-03-04T04:04:24 1709525064

> My overall rule of thumb is that in the presence of any exclusive acquires you can never assume a shared acquire is compatible with any other shared acquire.

Contract #1: In reader writer locks in general, after I share-acquire a lock, I know that there are no active exclusive owners of that lock and won't be until I release my shared lock. I can also expect that as long as I hold this shared lock, other threads can share-acquire the same lock without waiting. ony share-exclusive threads would have to wait.

This general contract seems useful to me.

The contract you're describing, contract #2, is one in which shared-acquire is an optional optimization over exclusive-acquire (not a contractual guarantee) and that the system is free to promote shared to exclusive lock acquisitions.

This other contract seems finnicky and error-prone.

Can't we have SRWLOCK implement contract #1?

NeillC · 2024-03-04T04:27:14 1709526434

It's typical not to allow a second share acquire to proceed if we have an exclusive waiter: t1: share t2: exclusive so waits t3: share also waits

Thats how we arrive at the rule that shared acquires in anything, but a trivial system (no exclusive acquires) may not be compatible. So, contract #1 is typically not satisfied. Of course, this particular case is slightly different and so you might decide to support it.

Tringi · 2024-03-04T07:06:32 1709535992

Blocking new readers when a writer arrives is perfectly good and desirable. Blocking readers when the writer finishes, and there isn't any new queued, is definitely not.

quotemstr · 2024-03-04T13:48:57 1709560137

I meant that contract #1 is that when I, thread A, already have a lock held in shared mode, I can expect shared-acquires on thread B to succeed without my first giving up my shared lock. The problem that we're discussing here is that I, A, can ask for a shared lock but actually and unknowingly get an exclusive lock. There's no exclusive-acquire involved.

rbmm · 2024-03-04T16:58:04 1709571484

"other threads can share-acquire the same lock without waiting." can but not always and mandatory. you really allow to system enter another shared requestor. but only allow,not demand. system not let shared requestor enter to lock, if before it the exclusive request will be. after first exclusive request - all next shared request will block, util this exclusive request not acquire and then release lock. can be and some another reason, in the end - this is only optimization - you only *allow* multiple shared enter at once. so i be not say that this is implementation bug

quotemstr · 2024-03-04T22:05:46 1709589946

So we have to treat shared acquires as exclusive ones just in case an exclusive acquire comes along? What if I know that my program doesn't do that?

rbmm · 2024-03-05T09:55:23 1709632523

yes. after any thread request exclusive access to lock, the sequential shared acquire request will block. this iswell known fact, i hope. this is example when shared request will be blocked, despite shared owner(s) now inside lock

also look my 2 comments on reddit: https://www.reddit.com/r/cpp/comments/1b55686/comment/ktfhjs... and https://www.reddit.com/r/cpp/comments/1b55686/comment/ktfggu...

if say true - i here in very strange position. i am by self under debugger research exacly what happens in concrete case and create repro code. i think that implementation of the RtlReleaseSRWLockExclusive is not the best and may be really containing "bug" as i in more details describe in comment on reddit. but from another side, if you want pure formal c++ rules - can you explain - in what exactly was bug in concre case ? what rule/guarantee is violated ? why demo code decide that ALL shared waiters can at once acquire the lock. formal documentation not state this. this intuitive must be true, because no exclusive request more. but.. i really dont know. and i also for fun create own implementation of SRW lock ( if someone interested to look -https://github.com/rbmm/PushLock ) which free from this roblem - i always ( hope) do only single atomic change to lock state during api call. and finally - sorry for my english and too long answer

zozbot234 · 2024-03-03T20:24:27 1709497467

Why does it have to use bit test and set and interleave with other threads, though. AIUI you can use a CAS loop to implement any RMW atomically over word-sized (or double word sized, on many platforms) data. That seems like a no-brainer.

For comparison, the Rust implementation for lightweight RWLocks on futex-capable *nix platforms is here: https://doc.rust-lang.org/stable/src/std/sys/unix/locks/fute... It sets the "reader counter" in the underlying atomic to a special value to signal that the lock is set for exclusive access. So a reader thread acquiring the lock as shared can never result in this kind of bug. Bits are used to signal whether readers or writers are currently waiting on a lock, but this just cannot turn a lock that's acquired for shared access into exclusive, or vice versa.

NeillC · 2024-03-04T00:07:20 1709510840

We use bit test and set as it causes us to get the cache line exclusive. This avoids using the prefetch write before fetching if we were to use CompareExchange. somebody was trying to talk to me about this stuff this morning but I didn't understand him. You can't expect a reader to always be compatible with other readers otherwise you livelock with a constant stream of readers. So we become incompatible. I am unsure if there is something beyond this.

zozbot234 · 2024-03-04T12:06:04 1709553964

Interesting comment for sure. Of course if this is intended behaviour it should at least be properly documented, since other implementations don't seem to do this random "upgrading" of a shared to an exclusive lock and it does create an issue whenever readers might be waiting on one another while holding the lock as shown in OP's code.

userbinator · 2024-03-03T22:47:53 1709506073

If that "rbmm" is the same person as I've seen on other sites, and the characteristic non-native English is a clue that it is, he certainly knows his stuff.

threads holding the lock don't wait for each other

Unless you're doing nested locking.

magicalhippo · 2024-03-04T01:17:37 1709515057

> Unless you're doing nested locking.

In my experience, needing to do nested locking is a sign you're up the wrong creek.

I've rewritten some interfaces and implementations from using nested locking to non-nested, and they became easier to use and much faster.

Not saying there's never a place for them, but I avoid nested locking like the plague.

rbmm · 2024-03-04T01:55:19 1709517319

yes, this is i (that "rbmm"), #opentowork

rbmm · 2024-03-04T16:51:32 1709571092

what is shared mode ? this is by fact optimization for speed, if we need read-only access to data, we allow to system let another thread into the section that requests shared access also

allow but NOT DEMAND this. If one thread has acquired the shared lock , other thread can acquire the shared lock too. but only CAN. in some case system not let another thread enter to lock, despite it also request share access. one case: if another rthread request exclusive acess - he begin wait and after this - any thread which acquire even shared access to lock - also begin wait

If lock_shared is called by a thread that already owns the mutex in any mode (exclusive or shared), the behavior is undefined.

and

Shared mode SRW locks should not be acquired recursively as this can lead to deadlocks when combined with exclusive acquisition.

why is this ? because if between 2 calls to lock_shared ( AcquireSRWLockShared ) another thread call AcquireSRWLockExclusive - the second call is block.

the code in example make assumption that ALL threads can enter to lock at once. that if one thread enter to lock in shared mode, another thread also ALWAYS can enter to lock in shared mode (if no exclusive requests). but i not view clear formalization of such requirement. and we must not based on this.

i be will add next rule:

thread inside lock must not wait on another thread to enter this lock

this is obvivius for exlusive access, but not obvivous to shared. but must be cleare stated along with the recursive rule ( should not be acquired recursively as this can lead to deadlocks, even in shared mode)

netcruiser · 2024-03-04T18:45:34 1709577934

The read threads block because the ReadWriteLock algorithm tries to prevent thread starvation (i.e. when the exclusive lock never gets acquired). Most ReadWriteLock implementation alternate between giving the read locks then the exclusive locks access to the lock.

rbmm · 2024-03-04T19:40:59 1709581259

of course - SRW lock allow shared access only if no waiters ( request to exclusive access) on lock. so even if lock in shared mode, new shared request can block, if was waiter(exclusive) already. in case OP no exactly this, but anyway - i be say that shared access is only hint to system, that it can optimize access - and allow multiple shared threads inside lock. but this was not always

AndrewStephens · 2024-03-03T18:07:42 1709489262

Subtle bugs in Reader/Writer locks do not surprise me. I worked on an in-house implementation based on Win32 (before C++11 and std::shared_mutex) and my recollection is that although the implementation sounds simple it is exceedingly easy to make subtle mistakes.

The experience left me with such a bad feeling for shared locks that I tend to avoid them unless absolutely required. When I last tested std::shared_mutex, performance was so poor compared to std::mutex that double-buffering the data protected by a simple mutex was much faster.

This was a great post by the original Redditor.

saclark11 · 2024-03-04T03:12:32 1709521952

Agreed. I avoid reader-writer locks unless absolutely required and benchmarks prove it worthwhile.

Their usage often fails to outperform a regular lock due to additional overhead. They seem to make sense only in specific high-contention scenarios where arrival rate is high and/or the critical section has a long duration [1].

[1]: https://petsta.net/blog/2022-09-30-rwmutex/ - Go specific, but I suspect these results hold true for most implementations of reader-writer locks.

tialaramex · 2024-03-03T19:39:48 1709494788

SRWLock seems like a situation where it's small enough for you to construct proofs that what you did is correct, and important enough that the enormous expense of such proof is justified.

The C++ std::mutex provided in Microsoft's STL is ludicrously big, whereas SRWLocks are the same size as a pointer. In one sense being ludicrously big is great - less risk the std::mutex will mistakenly share a cache line with your unrelated data, but for most people this makes std::mutex annoyingly expensive.

NeillC · 2024-03-04T00:09:20 1709510960

Maybe my memory is faulty but I believe it was analyzed by Leslie Lamports stuff. Ofcourse your building a model of how it should work and you might have faults in that.

Const-me · 2024-03-04T02:22:58 1709518978

C++ std::mutex provided in Microsoft's STL uses SRWLocks underneath, same size as a pointer.

Sadly, SRWLocks have yet another bug, they are very unfair. Write a loop which locks std::mutex inside the body, and no other thread will be able to grab the mutex despite the loop repeatedly releases then re-acquires the mutex. Critical sections are way more fair.

tialaramex · 2024-03-04T17:47:22 1709574442

> C++ std::mutex provided in Microsoft's STL uses SRWLocks underneath, same size as a pointer.

The SRWLocks are indeed the size of a pointer, the std::mutex is not for ABI reasons.

NeillC · 2024-03-04T04:29:27 1709526567

Critical sections are unfair also. Could be differences in spin count you are seeing.

Tringi · 2024-03-04T07:12:18 1709536338

Some time ago I did some test, 256 threads competing on a small number of cache lines, and found out that all, CreateMutex, CRITICAL_SECTION and SRWLOCK, were quite fair.

The most successful thread was only 25%, 15% and 9% ahead of the least successful one. On the contrary, in my simple usermode spinlock the unfairness would be 1000% or even 2000%.

convivialdingo · 2024-03-03T19:39:05 1709494745

I did some work with testing a cross platform in-house library for read-write locking.

We tested cmpxchg16b and found the performance was terrible with more than 4 cores.

Ended up using spin-locks similar to Linux kernel RCU.

userbinator · 2024-03-03T20:51:20 1709499080

It has been my experience that anything related to concurrency can be full of subtle edge cases. I tend to avoid it completely unless absolutely necessary.

rbmm · 2024-03-04T01:26:10 1709515570

several years ago i do own implementation of SRW/PushLocks - of course based on original NeillC code.. https://github.com/rbmm/SRW_ALT/tree/main/PushLock-ALT

they have slightly worse performance compared to MS, when high contention, but in test with this OP case - work well. in my test i not found bugs in implementation, but of course can not be sure that they not exist, very complex logic really. nobody test this, however very simply replace SRW calls to my implementation, by macros in h file

secondcoming · 2024-03-03T23:12:11 1709507531

If you have many readers and a single writer there's no point in the readers blocking each other.

AndrewStephens · 2024-03-03T23:24:01 1709508241

True, but in my experience the overhead of std::shared_mutex outweighs the benefits. Other approaches include:

* breaking up the lock so different threads can access different parts of your data structure concurrently.

* double-buffering (also called ping-pong buffers) where you effectively keep two copies of your data structure. The readers can access one without blocking, a single writer can modify the other and then swap.

* just accepting that reads will block each other with a std::mutex and work on minimizing the amount of time spent in the lock. This can actually work out quicker depending on your access patterns.

As always, careful profiling with real data is required to figure out what is better.

secondcoming · 2024-03-03T23:42:18 1709509338

The problem with double-buffering is that you still need to know when all the readers are no longer using one of the copies.

   T1: writer populates copy1
   T2: readers access copy1
   T3: writer populates copy2, and swaps
   T4: writer populates copy1, and swaps

At T4 a reader from T2 could still be accessing the old data structure.

Unless I'm overthinking it.

AndrewStephens · 2024-03-04T00:47:51 1709513271

No, you are right. One solution is that the readers access the buffer through a shared_ptr and can hold onto the old version for as long as they need it while the writer makes changes and creates a wholly new data structure. It is also possible for the writer to block new readers while doing the swap. Tradeoffs everywhere, depending on if you need readers to see changes that occurred after they started accessing the data.

to11mtm · 2024-03-04T00:59:15 1709513955

Nope, Ironically ran into this category of issue today with some buffer-reuse in a multi-threaded system.

SunlitCat · 2024-03-03T19:50:45 1709495445

Sad thing is, Reader / Writer locks are pretty tempting to use as they appear to be lightweight.

bingo3131 · 2024-03-03T16:44:43 1709484283

Misleading title?

This is a Windows API bug with the slim reader/writer (SRW) locks. It's just that the bug was discovered via std::shared_mutex as that is implemented using SRW locks.

SRW locks: https://learn.microsoft.com/en-us/windows/win32/sync/slim-re...

Confirmation from a Microsoft employee that the bug has been raised internally with the Windows API team: https://old.reddit.com/r/cpp/comments/1b55686/maybe_possible...

dang · 2024-03-03T18:04:03 1709489043

Ok, I've changed the title to say that. Thanks!

tialaramex · 2024-03-03T19:26:05 1709493965

Yes, for example if you make a Rust std::sync:RWLock on Windows, it will literally be SRWLock because Microsoft advertises this API as having exactly the behaviour Rust wants so why would you build something worse instead ?

Rust's Mutex on Windows is also an SRWLock but it can't hit this bug because it deliberately only uses the exclusive locking.

iknowstuff · 2024-03-03T20:12:15 1709496735

So it doesn't use a shared lock when .read() is used?

zozbot234 · 2024-03-03T20:18:54 1709497134

Rust has both Mutex and RWLock. The Mutex only uses exclusive locks, there's no distinction between "read" and "write".

iknowstuff · 2024-03-03T20:20:10 1709497210

Oh sorry my morning brain thought the comment was all about RWLock

loeg · 2024-03-03T18:47:43 1709491663

Some reddit comments mention it reproduces back to Vista (2008). I am kind of shocked no one has noticed this bug in that time. I guess under typical rwlock usage you just get random instances of shared lockers unable to acquire the lock and no deadlock, but still.

Diggsey · 2024-03-03T19:19:52 1709493592

I think part of the reason is that there's a very similar code pattern which is user error, and avoiding that pattern tends to avoid this pattern as well.

The similar case occurs when you have: - 1+ threads holding a shared lock (Readers) - 1+ threads waiting to acquire an exclusive lock (Pending Writers) - 1+ threads trying to acquire the shared lock (Pending Readers) - 1+ Reader is waiting on a Pending Reader

In this case the Pending Readers will be unable to acquire the shared lock even though it is still in "read mode" because in a fair RW lock Pending Writers are prioritised above Pending Readers so as not to starve the writer side of the lock.

TillE · 2024-03-03T22:35:43 1709505343

Vista is when that API was first implemented.

It's doing a pretty weird thing with the locks, I wonder what the actual use case was. Readers should almost never care about other readers. Typically you just grab the lock, read the thing, and release it. You always have to be super careful about deadlocks if you're holding a lock and also waiting around for something else to happen.

userbinator · 2024-03-03T20:38:47 1709498327

I'm curious if this also occurs in WINE's implementation.

I also want to test this on my highly customised XP install which has been patched to add the SRW API among other extensions, and where I had also patched the kernel to fix a race condition causing a deadlock in the keyed event API that the SRW implementation is based on (maybe it's this same one, although in Vista+ they changed it significantly; but the same edge case could occur.)

zozbot234 · 2024-03-03T22:38:29 1709505509

The code is here https://source.winehq.org/source/dlls/ntdll/sync.c#0474 and it uses compare exchange operations throughout, so it should be unaffected.

The ReactOS implementation is more involved https://doxygen.reactos.org/d1/db8/srw_8c_source.html but still, it uses mostly CAS operations both for the shared and the exclusive case. So it should be largely free from issues.

OsrsNeedsf2P · 2024-03-03T23:31:34 1709508694

The WINE team has expressed frustrations in the past about implementing the Windows APIs to spec... Only to find out Microsoft didn't

nightowl_games · 2024-03-03T20:39:20 1709498360

How did you patch the kernel? Like how is that possible?

userbinator · 2024-03-03T21:45:08 1709502308

With a hex editor, debugger, and skills that most developers these days seem to lack.

I patched the kernel in memory first, using a kernel debugger, to verify my fix worked before editing the file on disk.

Iwan-Zotow · 2024-03-04T04:44:36 1709527476

Aren't kernel modules signed?

runaum · 2024-03-04T07:35:34 1709537734

not in xp, iirc, vista was the first release requiring it

mangamadaiyan · 2024-03-04T00:20:15 1709511615

Windbg or SoftICE? :)

userbinator · 2024-03-04T02:39:21 1709519961

Windbg; it's free and doesn't require any setup to do this: https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

(Working out how to patch such that I wouldn't crash the system if a process happens to call that API while it was in a half-modified state was also a fun problem...)

I dug out the details on the bug I patched, and it isn't the same as this one; it's a race condition with timeouts on waiting for keyed events, which I believe isn't applicable in this situation as there are no timeouts.

mangamadaiyan · 2024-03-04T00:15:06 1709511306

Here's an example showing how to patch a userspace binary:

http://www.malsmith.net/blog/patching-closed-software/

Note: not my blog. (Edit: removed a probably unnecessary, and likely inaccurate, detail).

Patching the kernel would involve a similar (but slightly more complicated) process.

gorlilla · 2024-03-03T21:54:22 1709502862

Windows xp and server 2003 sources (though incomplete AFAIR) were leaked back in 2020.

chrisjj · 2024-03-03T18:26:45 1709490405

> Bug in reader/writer locks in Windows API

Correction: Bug in reader/writer locks in Windows

Not in API.

valleyer · 2024-03-03T19:31:14 1709494274

Thank you. Blaming the API is like saying there's a problem with your gas pedal after you blew a head gasket.

ww520 · 2024-03-03T21:44:30 1709502270

The program has a bug. It's mixing atomic and non-atomic variables in the yield() checking loop. Non-atomic variables have no guarantee on cache consistency for different threads. This can cause the loop to run forever.

    struct ThreadTestData {
        int32_t numThreads = 0;
        std::shared_mutex sharedMutex = {};
        std::atomic<int32_t> readCounter = 0;
    };

    // child thread
    DoStuff() {
        data->readCounter.fetch_add(1);
        while (data->readCounter.load() != data->numThreads) {
            std::this_thread::yield();
        }
    }

The numThreads field is not an atomic variable. It's initialized to 0 and set to 5 in the main thread. Its memory address is then passed to the child threads to be checked in the yielding loop. Since it's non-atomic, there's no memory barrier instruction to force its new value (5) to propagate to all CPU's running the threads. A child thread might get the old value 0. The logic of the yield checking loop using it would never exit.

Since the main thread runs the code in an endless loop, the same numThreads memory allocated on the stack is being set to 0 and 5 repeatedly. Some of the child threads can get the old value in one pass of the loop. Thus the hanging.

cesarb · 2024-03-03T22:36:44 1709505404

> there's no memory barrier instruction to force its new value (5) to propagate to all CPU's running the threads.

The equivalent of the memory barrier instructions is there, but it's hidden within the operating system code which creates and initializes a new thread. That is, the operating system ensures that the value in the current CPU (in this case, 5) is propagated to the CPU running the newly started thread, before the thread start routine (in this case, DoStuff) is called. The value is not modified while the child threads are running (it waits for the child threads to exit before clearing the value), so there's no chance of the child threads seeing the value being set back to zero.

ww520 · 2024-03-03T22:53:58 1709506438

Based on the Windows CreateThread API [1], it doesn't say anything about memory synchronization guarantee. Does it do internally?

[1] https://learn.microsoft.com/en-us/windows/win32/api/processt...

cesarb · 2024-03-03T23:13:49 1709507629

That MSDN documentation is unfortunately silent on this, but the example in the documentation (at https://learn.microsoft.com/en-us/windows/win32/procthread/c...) only makes sense if the operating system guarantees the ordering.

The C++ standard (at least a draft of it I found on a quick web search) is more explicit: it says (https://eel.is/c++draft/thread.thread.constr) "The completion of the invocation of the constructor synchronizes with the beginning of the invocation of the copy of f." (see https://eel.is/c++draft/intro.races for more detail on that "synchronizes with"). Since the code in question is using std::thread, even if the operating system did not have the relevant guarantees, the C++ standard library would have the required memory barriers.

tom_ · 2024-03-04T01:15:28 1709514928

Any time somebody tells you how simple C is by comparison, point them at https://port70.net/~nsz/c/c11/n1570.html#6.2.4p5:

> An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic storage duration, as do some compound literals. The result of attempting to indirectly access an object with automatic storage duration from a thread other than the one with which the object is associated is implementation-defined

(I bet in practice almost all implementations behave as an equivalent C++ would, as per your notes, so only the ordering is relevant. But people maintaining C implementations have on occasion shown themselves to be their users' enemies, so don't quote me on this!)

ajross · 2024-03-04T02:21:34 1709518894

More generally, the OS is going to be doing some level of synchronization on its own, likely a spinlock, during the thread creation process. Those always include memory barriers, because otherwise the locks they define don't actually work on OO systems.

ot · 2024-03-03T22:02:44 1709503364

`numThreads` is written before the child threads that read it are started, so there is an explicit happens-before relationship and no data race. Before `numThreads` is reset, the child thread are joined.

There is no bug in the program, it is legal to use non-atomic variables across threads as long as they're correctly sequenced.

ww520 · 2024-03-03T22:08:43 1709503723

See my last paragraph above.

nemothekid · 2024-03-03T21:55:16 1709502916

>Its memory address is then passed to the child threads to be checked in the yielding loop. Since it's non-atomic, there's no memory barrier instruction to force its new value (5) to propagate to all CPU's running the threads.

Each core would have to fetch the value from main memory, where it will be undoubtedly 5. There is no valid reordering (at least under x86) that would cause the thread to read 0.

ww520 · 2024-03-03T21:59:07 1709503147

The main thread is running the child thread creation in an endless loop, repeatedly setting numThreads to 0 and to 5, back to 0 and to 5 again. Can the caches of the CPU's consistently keep up with the changes?

zozbot234 · 2024-03-03T22:09:10 1709503750

> repeatedly setting numThreads to 0 and to 5, back to 0 and to 5 again.

The reset to 0 and to 5 happens at the start of the loop. There's a happens-before relationship between it and the threads being created, and then again between the threads being joined and the loop cycling back. So there shouldn't be any data race here.

ww520 · 2024-03-03T22:23:27 1709504607

CPU’s not running the main thread don’t care about the execution order of the instructions of the main thread. They only see their local caches of the same memory location got changed from 0 to 5, 5 to 0, and back to 5 repeatedly. When a new thread lands on a CPU with the old 0 cache value, it will hang.

cesarb · 2024-03-03T22:56:48 1709506608

> CPU’s not running the main thread don’t care about the execution order of the instructions of the main thread.

On x86, they do (the x86 family is unusual in having strong memory ordering), but that's not the issue here.

> They only see their local caches of the same memory location got changed from 0 to 5, 5 to 0, and back to 5 repeatedly.

Their local caches of that memory see only a 5, since at the moment they read that cache line, the value in memory is 5; the operating system ensures that the write of the 5 value by the main thread is flushed to memory[*] before the main thread starts the child thread, and also that the cache of the child thread does not have stale data from before that moment. That memory location is only set back to 0 after all the child threads have exited, so there's no instant where the child thread could read a 0 on that location from main memory into its cache.

> When a new thread lands on a CPU with the old 0 cache value, it will hang.

When a new thread lands on a CPU core with an old 0 cache value for that memory location (which could happen if that CPU core had been running the main thread, and the main thread was migrated to another CPU core before it could set it back to 5), it will still see a 5 at that memory location, because the operating system invalidates the cache of a CPU core when necessary before starting a new thread on it.

[*] Actually, it only has to be flushed as far as the last level cache, or the "point of unification" in ARM terminology; I simplified a lot in this explanation.

zozbot234 · 2024-03-03T22:36:53 1709505413

The code runs in what's effectively a single-threaded context. CPUs running other threads will be involved when joining and launching threads, this should suffice as synchronization.

forrestthewoods · 2024-03-03T22:15:55 1709504155

Your assessment is completely, totally, and verifiably wrong. It's obnoxious, but not surprising, how many commenters have chimed in with "program has bug BLAH" when they provably did not try to run the program themselves.

Because if you did attempt to run the program you'd find that changing numThreads to a constexpr makes no difference.

ajross · 2024-03-04T02:20:15 1709518815

> Non-atomic variables have no guarantee on cache consistency for different threads

Atomicity and cache coherence are different things. "Atomic" means that the access to the element will be done in a single access and can only show exactly the state resulting from any other atomic access to the same value. For C syntax variables, this pretty much is limited to multi-word access (you also sometimes talk about atomic compare-and-set instructions, but those don't appear as part of the language per se).

Cache coherence is actually guaranteed on almost all systems, you don't need to worry about it on anything big enough to be running Windows (in the embedded world we get to fight it though).

The other demon in this space is memory reordering, but atomics don't speak to that at all.

userbinator · 2024-03-03T21:51:17 1709502677

You seem to have missed the part where an actual MS employee confirmed it was a bug in their API.

ww520 · 2024-03-03T22:02:30 1709503350

He just read OP's code, the C++/STL Standard, and the Microsoft Learn document and made that conclusion. That's a rather haste determination. Unless he read the actual Windows lock implementation and found a bug there, I don't think his conclusion is correct.

forrestthewoods · 2024-03-03T22:22:33 1709504553

> That's a rather haste determination.

I hope you realize how deeply ironic this statement is. If you read the comments you'll find he even produced a slightly reduced repro. And in other comments he tried minor tweaks like the one you suggested.

You have a thesis that the program has a bug. (It doesn't.) Go ahead and test your thesis and report back.

mastax · 2024-03-03T16:22:10 1709482930

I understand why this is the case but it’s also extremely frustrating:

> It is extremely difficult for programmer-users to report bugs against the Windows API (we're supposed to direct you to Feedback Hub, but you may as well transmit your message into deep space). I've filed OS-49268777 "SRWLOCK can deadlock after an exclusive owner has released ownership and several reader threads are attempting to acquire shared ownership together" with a slightly reduced repro.

> Thanks for doing your homework and creating a self-contained repro, plus pre-emptively exonerating the STL. I've filed this OS bug as a special favor - bug reports are usually off-topic for r/cpp. The microsoft/STL GitHub repo is the proper channel for reporting STL misbehavior; it would have been acceptable here even though the root cause is in the Windows API because this situation is so rare. If you see STL misbehavior but it's clearly due to a compiler bug, reporting compiler bugs directly to VS Developer Community is the proper thing to do.

adgjlsfhk1 · 2024-03-03T16:47:58 1709484478

The best way I know of to report bugs to windows is report them as documentation bugs. see for example https://github.com/MicrosoftDocs/cpp-docs/pull/3526.

ddlsmurf · 2024-03-03T18:03:51 1709489031

It looks like it was declared in Nov 21, and in May 23 they merged in the "fix" by adding "it's approximate" to the docs ( https://github.com/MicrosoftDocs/cpp-docs/commit/447b5d8a781... ), so not sure this is the best approach for actual bugs.

IshKebab · 2024-03-03T16:52:30 1709484750

Clever!

kazinator · 2024-03-04T03:08:48 1709521728

Clever until they just fix the documentation to match what the implementation is doing.

jcalvinowens · 2024-03-03T17:02:52 1709485372

> It is extremely difficult for programmer-users to report bugs against the Windows API

I can't imagine living in this hell. When I find bugs in Linux, I E-mail the actual engineers directly and get responses in under 24 hours: https://lore.kernel.org/lkml/Zcb3_fdyJWUlZQci@gmail.com/

iknowstuff · 2024-03-03T19:07:41 1709492861

Thats pretty cool. No comments left in the code after the revert though. Are they relying on their minds and commit history as documentation of invariants? How do they prevent the same mistake from being made again in the future?

charcircuit · 2024-03-03T19:35:45 1709494545

Microsoft engineers have emails too.

dieortin · 2024-03-03T19:49:41 1709495381

And how would you know what email to contact in a case like this one? It’s not about “having email”

charcircuit · 2024-03-03T19:56:49 1709495809

You can either go down the route of finding the team that owns the code and then contacting someone on that team, or by contacting someone you know from the company to look it up for you or report the bug on your behalf.

account42 · 2024-03-05T10:33:19 1709634799

> the route of finding the team that owns the code and then contacting someone on that team

Nice circle, now the reader will only hav to fill in the rest of the owl.

The benefit of the open open development model of Linux is that you don't have to have "someone you know" on the inside or essentially spam whatever contacts you can dig up until you find someone who has pity on you. You have actually publicly available developers (from many different companies, including hardware manufacturers) as well as real bug trackers where you can find if other users have had the same issue.

Tringi · 2024-03-04T02:45:40 1709520340

And Twitter/X accounts.

I have had a lot more things fixed in Windows or MSVC from nagging devs on there than from reporting through any official channel.

charcircuit · 2024-03-04T04:33:26 1709526806

+1 if you want to reach a real engineer they are going to be spending their free time on sites like X and not on sites like some community feedback and bug reporting form

1over137 · 2024-03-03T16:54:23 1709484863

Nice to know that it's not just Apple to whom reporting bugs in hopeless. :)

masklinn · 2024-03-03T17:13:34 1709486014

I generally assume that's the case for any large company.

Sometimes I get pleasantly surprised, but generally speaking the internal incentives are skewed against, the primary focus is whatever the roadmap is followed by tickets from paying clients, public bugs generally have a very low hit ratio so they're unrewarding, unless you manage to snipe one of the company's employees (either nerd-snipe or interest / shock them enough to raise the issue internally) it's like playing the lottery.

generic92034 · 2024-03-03T18:02:58 1709488978

> I generally assume that's the case for any large company.

What works to some degree are dedicated maintenance teams providing development support. If their main task is fixing bugs and they are evaluated on this basis, support tickets reporting real bugs have a good chance to receive the required attention.

However, there is always the temptation for management to redirect resources from those teams. But at least in the B2B area costly customer escalations can remind management of the importance of good maintenance.

jansan · 2024-03-03T17:20:21 1709486421

> I generally assume that's the case for any large company.

Despite Google partly losing its marbles recently, reporting bugs to Chromium still works very well.

bee_rider · 2024-03-03T17:45:33 1709487933

I wonder if the open source element of the project keeps them “honest” to some extent?

GrumpySloth · 2024-03-03T23:28:31 1709508511

Not my experience. When I report a bug in Chromium, it’s usually quickly verified as legit, and then left for dead forever.

tgv · 2024-03-03T17:47:16 1709488036

Depends on the team, I think. I've had no luck at all with the OS/framework and "core" apps bugs, except getting the report that it has been fixed, and I should install the new version of the OS. Only to find out the bug wasn't solved.

The Logic team, however, has been helpful, and in one or two cases (that were discussed in musician's forums) went out of their way.

indrora · 2024-03-04T00:39:29 1709512769

Former AWS here.

My literal job for the last part of my time at AWS was "help triage bugs in the AWS SDK." This is by far the best repro I've ever seen for such an in-depth event.

Most of the tickets you get in open ticket trackers are incomplete [ https://github.com/boto/boto3/issues/4011 ] nonsensical [ https://github.com/boto/boto3/issues/4018 ] or weird [ https://github.com/boto/boto3/issues/358 ].

Dalewyn · 2024-03-03T18:13:36 1709489616

The amount of attention and care a bug report or a piece of feedback receives is inversely proportional to how easy it is to file it.

TillE · 2024-03-03T17:50:57 1709488257

Like the comment says, filing an issue on GitHub would've also eventually led to a proper internal report, so the system isn't wholly reliant on one guy trawling Reddit.

I have many complaints about Win32 (having half a dozen different error code types, for one), but outright bugs are really rare.

amluto · 2024-03-03T17:06:25 1709485585

Once upon a time, you could buy various things from MS that came with support incidents. I had an MSDN subscription that came with two per year. Using an incident got you an actual support engineer who would be helpful and escalate issues if necessary. And, if your issue turned out to be a real bug of any significance in an MS product, your support incident would be credited back.

This was great for developers (real support was available, and spam was discouraged), and it was great for MS (they found real issues that affected paying customers and they got feedback to improve their documentation).

I wonder whether this program still exists. I have the impression the overall quality of MS documentation has declined.

sebazzz · 2024-03-03T17:19:40 1709486380

Generally as far as first line support goes, everything is outsourced to India. Cheaper, on paper.

Anectodically, I tried to get support for something unrelated: I’m trying to use IMAP sync support in Outlook.com, but it refuses to work properly with iCloud-IMAP but doesn’t give any error message either. As a paying M365 subscriber I expect proper support. I tried over five times to get support, and every time it ended in frustration. Every time I got someone who either doesn’t understand to product, says IMAP support is not available and deprecated (it is not! I was able to set-up a different IMAP provider just fine), or I got redirected to the Windows or Office support team, who then couldn’t help me and closed my ticket.

Someone1234 · 2024-03-03T17:29:23 1709486963

Often times it just feels like they try to extend the ticket out as long as possible asking irrelevant questions, then as you said redirect you or close the ticket after wasting an appropriate amount of time.

I cannot prove it, but I bet there is an internal number of replies before they can close without penalty.

rwmj · 2024-03-03T17:49:43 1709488183

If it's any consolation, our company pays for 20,000 GMail licenses and we seem unable to escalate any issue at all to Google, even major issues such as their clearly not working spam filtering, emails being lost, incorrect deduplication of emails, or their random IMAP throttling.

amluto · 2024-03-03T18:13:32 1709489612

Is there any vendor of actually good email-as-a-service? O365/Exchange/Outlook/Hotmail is a mess. Google has a support problem. Fastmail has an offline email problem. iCloud is not obviously suitable for professional use.

What’s left?

Avamander · 2024-03-03T19:39:10 1709494750

There are multiple providers out there that have their own stacks (Google, Microsoft, Tutanota, Protonmail to some extent and others) and there are those that use more common combinations (basically just managed mail-in-a-box). Nothing perfect though, so pick your poison.

Some issues are also due to the ecosystem itself. Avoiding POP3 goes a long way for example.

amluto · 2024-03-03T23:42:01 1709509321

Avoiding POP3 is great. Some people like to say the future is JMAP, but those same people provide such a weak mobile email app that I find it hard to believe. Maybe the protocol can do better than the app by the same vendor?

account42 · 2024-03-05T10:41:26 1709635286

IMAP, as horrible as it is on a technical level, does the job just fine.

layer8 · 2024-03-04T01:18:12 1709515092

Many VPS/web hosters provide email services along with their custom-domain support, e.g. https://www.inmotionhosting.com/support/email/. It’s worthwhile to look into that space, because you are getting dedicated and responsive customer support instead of an anonymous black-box mess.

holigot · 2024-03-04T12:38:22 1709555902

Good question.

It really is a tough choice. Microsoft does have the best products for their and mobile platforms (including iOS,iPadOS) with really good Offline Features and so on. But it still is Microsoft and Exchange behind...

A better product is Fastmail. But also US Servers and AUS company and a lot of downsides concerning offline usage and other privacy points.

What else? I try to avoid Google as much as I can.... Well there is Proton with it's Proton Suite getting better and better. There is german mailbox.org (worse 2FA) but overall a good privacy mail provider with own domains. There is migadu.com from Switzerland with EU Servers (rented at OVH Data Centers) and some other players like Tuta and others...

bongodongobob · 2024-03-04T02:34:38 1709519678

How is O365/Exchange/Outlook a mess? As a user? As an admin? Genuinely curious. As an admin, I love O365.

amluto · 2024-03-04T03:42:20 1709523740

As an occasional user: here are a few complaints:

There are too many variants, all only vaguely compatible.

The iOS integration requires enrolling one’s phone in the email provider’s MDM, at least to some extent. This is nice if you’re an admin, but it’s not so nice if you’re a user who uses (in accordance with company policy!) a personal device.

The integration with Mail.app is abysmal. It makes my memories of Eudora seem happy.

Signing in is a real PITA.

The spam classifier is comically poor. I’m honestly surprised that (hundreds of) millions of dollars aren’t lost every year when a (paying, enterprise) customer emails someone at a different business (from the native app!), they reply with an utterly non-spammy reply, and the reply is classified as spam. Seriously, the open source spam classifiers from the early 2000s understand threading — how can Microsoft fail to classify individual replies as not-spam? Google is far better. Fastmail is far better. Everything is far better.

I will give MS some credit: the iOS Outlook app is actually pretty nice.

jeff_tyrrill · 2024-03-04T04:18:13 1709525893

Outlook is a once-great product that has been left to rot by a Microsoft with little lineage to the great company from the 90s and early 00s that created it.

Outlook debuted Cached Exchange Mode in the early 00s, popularizing "offline first" before it was known as that.

Now: The "new" Outlook can't even show folder unread counts correctly, even when fully online. It seems to only load a small subset of messages locally, only populating folders when you scroll past the point it loaded. (In classic Outlook, this was a setting—I understand loading all mail was not enabled by default—but it could be enabled. No longer.) It sometimes gets stuck where it won't show new mail until restarted. (Gmail has this bug too.) It forgets open mail windows when restarted. It forgets expanded folders in the folder pane when restarted (but only sometimes).

Microsoft removed the ability to show the mail/contacts/calendar navigation bar below the folder pane, and forced it to be shown on its own huge vertical bar, almost all of which is wasted space. For good measure, they did this in classic Outlook as well as "new" Outlook. There was massive backlash to this, and Microsoft plowed forward anyway. On Windows there is/was a registry setting to revert this (but intentionally, no user-facing setting). I have not checked on Mac.

To the sibling comment: Outlook for iOS is indeed great, probably only because it was an acquisition. It is not in Microsoft's DNA to build an app like this themselves anymore.

As an admin: Microsoft seems to redo the Office 365 admin interface every 2-3 years. It is an incomprehensible mess. I am also a Google Workspace admin for the past several years, and theirs is far better, and it's more or less stable over the long term.

Office 365 has been hacked by state actors recently.

I still like the Outlook UI and feature set better than Gmail (despite the "new" Outlook being a major regression), so I begrudgingly stay with Outlook/Exchange because I dislike it less than Google Workspace.

Fastmail does not give signs that they are a relevant company—they created JMAP and basically did nothing with it. Why not make a first-class Windows/Mac client, offline first, with powerful organizational features, like the Outlook of yore? Or at least, contribute to adding first class JMAP support to Thunderbird? This is your sole business.

holigot · 2024-03-04T12:33:54 1709555634

Especially the point about Fastmail really is a good point! You nailed it with your comment. With JMAP in particular, they have been given huge potential with a huge feature set but have not implemented it (especially not in their own products). Especially Offline Usage...

account42 · 2024-03-05T11:28:59 1709638139

Microsoft randomly blocks entire subnets of mailservers. If you care at all about independent mail hosting (and you should if you want email to not turn into yet another proprietary walled garden) then you should not use Microsoft's mail products.

twisteriffic · 2024-03-03T17:16:13 1709486173

That's Premier support, though now with a bunch of extra layers of contractors who can't do anything other than follow a script, capture traces and wait on the next level. In my experience it's been a time consuming waste of effort. I've had a lot better luck harvesting names and business cards from the teams at conferences and demo days, then going direct to the source.

pugz · 2024-03-03T23:25:07 1709508307

Apple still has this program. I haven't worked as an iOS developer for a few years now, but I remember their paid "technical support incidents" were extremely high quality. It was $50 per incident, but I think that was just to reduce spam.

I remember being astounded by the technical depth of a particular answer and looking up the the engineer on LinkedIn. He had 20+ years of experience at Apple - and it showed in his answer.

OsrsNeedsf2P · 2024-03-03T23:36:10 1709508970

Mullvad had to go public[0] with a security leak before Apple even responded to their reports in the release candidates for MacOS 14

[0]https://mullvad.net/en/blog/bug-in-macos-14-sonoma-prevents-...

userbinator · 2024-03-03T20:48:13 1709498893

You can still pay MS for support. Whether you'll get the answer you want, as the others have noted in sibling comments, is highly variable.

marcosdumay · 2024-03-03T19:16:41 1709493401

Well, maybe last century.

At any point of this century, the response to those support incidents from MS was always to deny a problem, and if it's a known one, to try to gaslight the customer in a direction contrary to solving it.

I have seen organizations lose way too many people-hours trying to satisfy the MS support and apply what it recommended. That when the real solution was often reachable in a hour or two of research on 3rd party knowledge bases.

amluto · 2024-03-03T19:23:53 1709493833

ISTR a got an actual useful outcome from a support incident related to a bug in the runtime libraries shipped with Visual Studio 2005. Support gave me a hotfix.

Sadly the hotfix had its own little bug: installing it took about 20 hours. Office productivity was rather low for the rest of the day. I don’t know everything that goes on under the hood with Microsoft’s installers, but wow they’re slow.

I, blissfully, don’t use Visual Studio on Windows any more :)

kkert · 2024-03-03T20:08:51 1709496531

I actually remember reporting a crypto bug way back through a similar program in Win98, where certain keys would not work if the highest bit was set in the key material. It got fixed

Nition · 2024-03-03T19:25:15 1709493915

> It is extremely difficult for programmer-users to report bugs against the Windows API (we're supposed to direct you to Feedback Hub, but you may as well transmit your message into deep space).

:'‑(

Avamander · 2024-03-03T19:32:28 1709494348

This is an issue with a lot of MS products. I honestly have no clue how much money one would have to throw at MS to get a bug fixed.

For example Microsoft's *own* Pluton-enabled platforms fail Windows' Device Health Attestation checks due to an incomplete chain (https://call4cloud.nl/2023/04/are-you-there-intune-its-me-ha...).

Nition · 2024-03-03T19:41:52 1709494912

I've noticed a lot of big products have a user feedback cycle that goes something like this:

- Create new feedback tracker

- Direct feedback to tracker

- Stop paying any attention to tracker

- Tracker is hundreds of pages of users shouting into the void, and much of it out-of-date

- Delete everything

- Create new feedback tracker...

AnthonyMouse · 2024-03-03T19:56:37 1709495797

The fundamental problem here is that they have a billion users and most of them don't know what they're talking about. If you create a simple way to contact the company it will soon be full of messages from end users who can't even articulate what their problem is but it's usually some kind of malware or user error and is definitely not a problem with whatever component they're reporting the issue against.

What you really need is a way to report problems which is high friction. You have to submit a git pull request of your ssh public key so you can transmit your bug report via sftp. Now they only get bug reports from people who can figure out how to do that and can actually pay attention to them because it filters out all the spam from people asking Microsoft how to connect their Android to a Mac.

mschuster91 · 2024-03-03T20:35:22 1709498122

Microsoft makes ~16 billion $ a quarter in profit, of which they return ~10 billion $ to their investors.

They could go and spend 200 million a quarter on decent customer support without making too much of a dent in their financial line.

[1] https://www.microsoft.com/en-us/investor/earnings/fy-2023-q2...

AnthonyMouse · 2024-03-03T20:41:05 1709498465

$200M is approximately $0.15/user. How much support do you expect to get for that?

mschuster91 · 2024-03-03T21:14:40 1709500480

How many of these 1 billion users actually need support? Only a tiny fraction.

AnthonyMouse · 2024-03-03T21:31:38 1709501498

Uh... what? If there was a free number you could call to get competent technical support, people would spend their entire day on the phone with it instead of reading documentation or hiring IT staff.

deckar01 · 2024-03-04T03:42:08 1709523728

Not just high friction, but a formal verification like certification would be interesting.

AnthonyMouse · 2024-03-04T10:32:46 1709548366

Nah, you don't want bureaucratic gatekeeping. You want the 14 year old in Kenya or Ukraine with technical competence but no ID to be able to report the bug he finds.

The point is to exclude people who don't know what they're talking about, not people who can't pay registration fees or produce documents.

account42 · 2024-03-05T11:38:14 1709638694

Delegating first-level bug triage to the community can also work as long as your company is well liked enough to motivate volunteers. For that you just need a public bug tracker that everyone can post to but only allow regular users (e.g. those who have made useful bug reports before) to confirm something is a not yet known bug and forward it to your developers. This way once you do get notified of a real bug you will have access to not just that one report but also potential duplicates that might provide interesting information - which might have never been filed if the first level was too high friction.

Nition · 2024-03-03T19:54:50 1709495690

There's nothing worse than having a problem with a product, finding 250 other people on the feedback tracker that have had the same problem over the last three years, and the only official response is some support person on the first page who's saying your feedback is very important to us, and have you tried [troubleshooting that won't work]?

Avamander · 2024-03-03T20:04:48 1709496288

At that scale they have 250 000 people that didn't try the obvious troubleshooting first though. But yes, it's annoying.

Nition · 2024-03-03T23:37:34 1709509054

They often seem to end up like this:

Topic: Can't select blue wallpaper

Body: Hi, for some reason I can't select blue as a wallpaper colour. I can select any other colour, just not blue. I have Enable All Colours ticked. Anything else I'm missing?

Reply #1: Hi I have this problem too, does anyone know?

Reply #2: Same problem here.

Reply #3: Same problem, I can select any colour except blue.

Reply #4: Hi there User344925. Let me first say that your feedback is incredibly, impossibly important to us. I'm Bob and I'll be your Licensed Support Person today. Let's get started and see if we can solve your problem. I understand you're having trouble setting the colour blue for your wallpaper in personalisation settings. This is a common problem to have, and I'm pleased to say there is a simple fix available. Open Settings -> Personalisation -> Advanced Settings -> Advanced Personalisation Settings, and tick "Enable All Colours". Now you should be able to set any colour you like. Please remember to mark this question as SOLVED at the top and mark my post as the Approved Answer if this solved your problem. Have a great day!

Replies #5-#250: Users with the same problem.

TexasMick · 2024-03-03T20:41:28 1709498488

I've ran into this too, the feedback hub seems to be populated by first line support types.

I also found a bug in a win32 API and the feedback hub told me to reboot my PC

mook · 2024-03-03T21:41:54 1709502114

Unfortunately, the opposite of that is everybody trying to file random things that aren't actionable. See the collection of mail sent to the curl maintainer…

https://github.com/bagder/emails/blob/main/2015/2015-06-08.m...