For me, I’ve got use cases where it’s valuable to keep event data interleaved because it will also get used in flight. It works well enough I also use it for things where it’s not necessary like in memory debug rings (which requires a bit of additional work).
The epoch isn’t CAS’d; it is FAA. The epoch is then used to determine if there is contention due to the tail meeting the head, or due to a wrap-around due to slow writes.
There’s also a back-off scheme to ease contention for a full queue.
Though, I did originally have a variant that adds a fairly straightforward ‘help’ mechanism that makes the algorithm wait-free and reduces the computational complexity.
However, the extra overhead didn’t seem worth it, so I took it out pretty quickly. Iirc, the only place where the ring in the queue wouldn’t out-perform it are on tiny queues with a huge write imbalance.
If you go run the tests in the repo associated w the article, you probably will see that a ring with only 16 entries will tend to start being non-performant at about a 4:1 writer to reader ratio. But iirc that effect goes away before 128 slots in the ring.
There, the ring still fits in a page, and even with a big imbalance I can’t remember seeing less than 1m ops per second on my Mac laptop.
Real world observational data beats worst case analysis, and I’ve never seen an issue for scenarios I consider reasonable.
But, if my unrealistic is realistic to you, let me know and I can break out the wait free version for you.
> The epoch isn’t CAS’d; it is FAA.
fetch_add requires taking a cache line exclusive. If producers and consumers are both doing this on the same cache line, it is very expensive and does not scale.
(withinboredom and gpderetta have raised the same or similar concerns.)