The moral of the story for software developers is to be aware of false sharing and cache line bouncing when writing multithreaded code.
Seemingly innocent code like this will most likely cause completely unnecessary inter-core traffic (assuming that the compiler has laid out a and b adjacently, and they both fall within the boundaries of a single cache line):
unsigned a, b;
void thread_a(void) {
for (;;) a++;
}
void thread_b(void) {
for (;;) b++;
}
If you are more interested, Appendix C of "Is Parallel Programming Hard, And, If So, What Can You Do About It?" by Paul McKenney (http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/p...) provides a very detailed description as well. It really helped improve my understanding of how memory barriers and atomics work
Seemingly innocent code like this will most likely cause completely unnecessary inter-core traffic (assuming that the compiler has laid out a and b adjacently, and they both fall within the boundaries of a single cache line):