Hybrid locks are also bad for overall system performance by maximizing local application performance. There is a reason default lock implementations from OS don't spin even a little bit.
That depends on your workload. If you're making a game that's expected to use near 100% of system resources, or a real time service pinned to specific cores, your local application is the overall system.
This is nonsense. If the lock hasn't been acquired, you don't spin to begin with and if the lock has been acquired and the lock is being released shortly after, the spinning avoids a context switch. If the maximum number of retries has been reached, the thread was going to sleep anyway and starts scheduling the next thread (which was only delayed by the few attempted spins). This means in the worst case the next spin will only happen once all the other queued up threads have had their turn and that's assuming you're immediately running into another acquired lock.
> There is a reason default lock implementations from OS don't spin even a little bit.
glibc pthread mutex uses a user-space spinlock to mitigate the syscall cost for uncontended cases.