I did a lock-free MPMC ring buffer with 1 128 bit CAS and 1 64 bit CAS for enqueue and 1 64 bit CAS for dequeue. The payload is an unrestricted uintptr_t (64 bit) value so no way to avoid the 128 bit CAS in the enqueue.