An Overview of Kernel Lock Improvements [pdf]

xroche · on Sept 2, 2014

Note: for those interested, MCS locks and qspinlocks are discussed in this fine article: http://lwn.net/Articles/590243/

MoOmer · on Sept 2, 2014

This is great; I'm still reading through, but, I had no idea that the performance drop was that steep!

jebblue · on Sept 2, 2014

Do you mean the drop they later used to analyze cache line coherency when adding cores? They improved this if I understood the latter results correctly.

fmstephe · on Sept 2, 2014

Can anyone explain the use of 'round-robin' to describe mulit-node scenarios and 'fill-first' for single node scenarios. I initially assumed they were describing thread schedulers, but that doesn't make clear sense in these tests. Thanks in advance.

readerrrr · on Sept 2, 2014

This is the key: each locker spins on local memory rather than the lock word at page 63.

yxhuvud · on Sept 2, 2014

It would be interesting to see the same benchmarks for more normal amounts of cores. Most system does not have 240 cores, after all..

ck2 · on Sept 2, 2014

You can get 20core/40thread servers for $400/mo these days:

http://www.ovh.com/us/dedicated-servers/enterprise/2014-MG-2...

or 16core/32thread for $270

http://www.ovh.com/us/dedicated-servers/enterprise/2014-MG-1...

So double socket performance drop is a realistic concern.

readerrrr · on Sept 2, 2014

The latest Intel hi-end desktop cpu has 8 cores. With hyper-threading; which makes 16 threads.

I think in the next 10 years we will reach a number that is close.

MaxBarraclough · on Sept 2, 2014

If you count Xeon Phi, Intel are already up to 61.

http://ark.intel.com/products/family/71840/Intel-Xeon-Phi-Co...

davidlohr · on Sept 2, 2014

Most of these optimizations imply scaling by number of cores. Thus the more cores (and thus sockets/NUMA nodes), the more the benefit. Desktop-ish systems with ~4 cores don't see much gain, but nor did we introduce performance degradations.

molixiaoge · on Sept 2, 2014

great