>Critical section under 100ns, low contention (2-4 threads): Spinlock. You’ll waste less time spinning than you would on a context switch.
If your sections are that short then you can use a hybrid mutex and never actually park. Unless you're wrong about how long things take, in which case you'll save yourself.
>alignas(64) in C++
std::hardware_destructive_interference_size
Exists so you don't have to guess, although in practice it'll basically always be 64.The code samples also don't obey the basic best practices for spinlocks for x86_64 or arm64. Spinlocks should perform a relaxed read in the loop, and only attempt a compare and set with acquire order if the first check shows the lock is unowned. This avoids hammering the CPU with cache coherency traffic.
Similarly the x86 PAUSE instruction isn't mentioned, even though it exist specifically to signal spin sections to the CPU.
Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex. Spinning for consumer threads can be done in specialty exclusive thread per core cases where you want to minimize wakeup costs, but that's not the same as a spinlock which would cause any contending thread to spin.
> Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex
Yeah, pure spinlocks in user-space programs is a big no-no in my book. If you're on the happy path then it costs you nothing extra in terms of performance, and if you for some reason slide off the happy path you have a sensible fall-back.
> std::hardware_destructive_interference_size Exists so you don't have to guess, although in practice it'll basically always be 64.
Unfortunately it's not quite true, do to e.g. spacial prefetching [0]. See e.g. Folly's definition [1].
[0] https://community.intel.com/t5/Intel-Moderncode-for-Parallel...
[1] https://github.com/facebook/folly/blob/d2e6fe65dfd6b30a9d504...
Some things from the article are debatable for sure, and some are maybe missing like the one you mention with PAUSE instruction, which I also have not been aware of, but generally speaking I thought it was a really good content. Lean system engineering skills applied to real world problems. I especially appreciated the examples of large-scale infra codebases doing it in practice.
Hybrid locks are also bad for overall system performance by maximizing local application performance. There is a reason default lock implementations from OS don't spin even a little bit.
> std::hardware_destructive_interference_size
Of course, this is just the number the compiler thinks is good. It’s not necessarily the number that is actually good for your target machine.
The PAUSE instruction isn't actually as good as it used to be. In, iirc, Skylake Intel massively increased the latency to improve utilisation under hyperthreading. The latency of this instruction is now really high.
Most people using spinlocks really care about latency, and many will have hyperthreading disabled to reduce jitter
> Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex. Spinning for consumer threads can be done in specialty exclusive thread per core cases where you want to minimize wakeup costs, but that's not the same as a spinlock which would cause any contending thread to spin.
Very much this. Spins benchmark well but scale poorly.