Please just don't use spinlocks in userland code. It's really not the appropriate mechanism.
Your code will look great in your synthetic benchmarks and then it will end up burning CPU for no good reason in the real world.
Loved this article. It showed how lacking my knowledge is in how operating systems implement concurrency primitives. It motivated me to do a bunch of research and learn more.
Notably the claim about how atomic operations clear the cache line in every cpu. Wow! Shared data can really be a performance limitation.
I don’t understand why I would need to care about this. Can’t my operating system and/or pthread library sort this out by itself?
Where do lock free algorithms fall in this analysis?
> The Linux kernel learned this the hard way. Early 2.6 kernels used spinlocks everywhere, wasting 10-20% CPU on contended locks because preemption would stretch what should’ve been 100ns holds into milliseconds. Modern kernels use mutexes for most subsystems.
That's not accurate: the scalability improvements in Linux are a result of broadly eliminating serialization, not something as trivial as using a different locking primitive. The BKL didn't go away until 2.6.37! As much as "spinlock madness" might make a nice little story, it's just simply not true.
I love that this article includes a test program at the bottom to allow you to verify its claims.
I didn't know about using alignment to avoid cache bouncing. Fascinating stuff
>Critical section under 100ns, low contention (2-4 threads): Spinlock. You’ll waste less time spinning than you would on a context switch.
If your sections are that short then you can use a hybrid mutex and never actually park. Unless you're wrong about how long things take, in which case you'll save yourself.
>alignas(64) in C++
Exists so you don't have to guess, although in practice it'll basically always be 64.The code samples also don't obey the basic best practices for spinlocks for x86_64 or arm64. Spinlocks should perform a relaxed read in the loop, and only attempt a compare and set with acquire order if the first check shows the lock is unowned. This avoids hammering the CPU with cache coherency traffic.
Similarly the x86 PAUSE instruction isn't mentioned, even though it exist specifically to signal spin sections to the CPU.
Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex. Spinning for consumer threads can be done in specialty exclusive thread per core cases where you want to minimize wakeup costs, but that's not the same as a spinlock which would cause any contending thread to spin.