> Turns out to be pretty crucial for performance though...
I don't doubt it. I just meant nasty with respect to using futex() to sleep instead of spin, I was having some "fun" trying.
I can certainly see how pushing that state into one atomic would simplify things, I didn't really mean to question that.
> We're on our way to barely ever need the spinlock for the buffer header, which then should allow us to get rid of many of those loops.
I'm cheering you on, I hadn't looked at this code before and its been fun looking through some of the recent work on it.
> It'd be pretty nice to have. There are lot of cases where one needs more lock state than one can really encode into a 32bit lock state.
I've seen too much open coded spinning around 64-bit CAS in proprietary code, where it was a real demonstrable problem, and similar to here it was often not straightforward to avoid. I confess to some bias because of this experience ("not all spinlocks...") :)
I remember a lot of cases where FUTEX_WAIT64/FUTEX_WAKE64 would have been a drop-in solution, that seems compelling to me.