>Critical section under 100ns, low contention (2-4 threads): Spinlock. You’ll waste less time spi...

charleslmunger • last Monday at 2:07 AM • 7 replies • view on HN

>Critical section under 100ns, low contention (2-4 threads): Spinlock. You’ll waste less time spinning than you would on a context switch.

If your sections are that short then you can use a hybrid mutex and never actually park. Unless you're wrong about how long things take, in which case you'll save yourself.

>alignas(64) in C++

    std::hardware_destructive_interference_size

Exists so you don't have to guess, although in practice it'll basically always be 64.

The code samples also don't obey the basic best practices for spinlocks for x86_64 or arm64. Spinlocks should perform a relaxed read in the loop, and only attempt a compare and set with acquire order if the first check shows the lock is unowned. This avoids hammering the CPU with cache coherency traffic.

Similarly the x86 PAUSE instruction isn't mentioned, even though it exist specifically to signal spin sections to the CPU.

Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex. Spinning for consumer threads can be done in specialty exclusive thread per core cases where you want to minimize wakeup costs, but that's not the same as a spinlock which would cause any contending thread to spin.

Replies

raggi • last Monday at 2:46 AM

> Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex. Spinning for consumer threads can be done in specialty exclusive thread per core cases where you want to minimize wakeup costs, but that's not the same as a spinlock which would cause any contending thread to spin.

Very much this. Spins benchmark well but scale poorly.

magicalhippo • last Monday at 2:33 AM

> Spinlocks outside the kernel are a bad idea in almost all cases, except dedicated nonpreemptable cases; use a hybrid mutex

Yeah, pure spinlocks in user-space programs is a big no-no in my book. If you're on the happy path then it costs you nothing extra in terms of performance, and if you for some reason slide off the happy path you have a sensible fall-back.

charleshn • last Monday at 4:04 AM

> std::hardware_destructive_interference_size Exists so you don't have to guess, although in practice it'll basically always be 64.

Unfortunately it's not quite true, do to e.g. spacial prefetching [0]. See e.g. Folly's definition [1].

[0] https://community.intel.com/t5/Intel-Moderncode-for-Parallel...

[1] https://github.com/facebook/folly/blob/d2e6fe65dfd6b30a9d504...

menaerus • last Monday at 11:10 AM

Some things from the article are debatable for sure, and some are maybe missing like the one you mention with PAUSE instruction, which I also have not been aware of, but generally speaking I thought it was a really good content. Lean system engineering skills applied to real world problems. I especially appreciated the examples of large-scale infra codebases doing it in practice.

surajrmal • last Monday at 4:28 AM

Hybrid locks are also bad for overall system performance by maximizing local application performance. There is a reason default lock implementations from OS don't spin even a little bit.

saagarjha • last Monday at 4:20 AM

> std::hardware_destructive_interference_size

Of course, this is just the number the compiler thinks is good. It’s not necessarily the number that is actually good for your target machine.

nly • last Monday at 9:41 AM

The PAUSE instruction isn't actually as good as it used to be. In, iirc, Skylake Intel massively increased the latency to improve utilisation under hyperthreading. The latency of this instruction is now really high.

Most people using spinlocks really care about latency, and many will have hyperthreading disabled to reduce jitter

➕ show 1 reply

alt Hacker News

Replies