> If we set _foo to 0 and have two threads that both execute incl (_foo) 10000 times each, incrementing the same location with a single instruction 20000 times, is guaranteed not to exceed 20000, but it could (theoretically) be as low as 2.
Can someone explain the scenario where this test would result in _foo = 2? The lowest theoretical value I can understand is foo_ = 10000 (all 10000 incls from thread 1 are executed between one pair of load and store in thread 2 and hence lost).
You're almost there! Basically that swapping you outlined occurs twice, once on each side.
1. Both threads read '0', beginning their first incl instruction (A: 0, B: 0, Store: 0)
2. Thread A completes execution of all BUT ONE incl, writing it's second to last value to the store (A: 9999, B: 0, Store: 9999)
3. Thread B increments it's '0' to '1' (A: 9999, B: 1, Store: 9999)
4. Thread B writes it's '1' to the store (A: 9999, B: 1, Store: 1)
5. Thread A begins it's final incl and reads the '1' that thread B just stored (A: 1, B: 1, Store: 1)
6. Thread B executes all remaining incl instructions, writing its final value to the store (A: 1, B:10000, Store: 10000)
7. Thread A continues it's final incl and increments it's '1' to '2' (A: 2, B:10000, Store: 10000)
8. Thread A stores it's final '2' result (A: 2, B:10000, Store: 2)
9. All instructions have executed, and the result is '2'
It is, of course, completely theory and extremely unlikely under real use as the authors graph shows, but it's one of those threading 'gotchas' that lead to occasionally unpredictable results.
Can someone explain the scenario where this test would result in _foo = 2? The lowest theoretical value I can understand is foo_ = 10000 (all 10000 incls from thread 1 are executed between one pair of load and store in thread 2 and hence lost).