It is not just a way of writing ring buffers. It's a way of implementing concurrent non-blockin...

RossBencina • yesterday at 10:02 PM • 8 replies • view on HN

It is not just a way of writing ring buffers. It's a way of implementing concurrent non-blocking single-reader single-writer atomic ring buffers with only atomic load and store (and memory barriers).

The author says that non-power-of-two is not possible, but I'm pretty sure it is if you use a conditional instead of integer modulus.

I first learnt of this technique from Phil Burk, we've been using it in PortAudio forever. The technique is also widely known in FPGA/hardware circles, see:

"Simulation and Synthesis Techniques for Asynchronous FIFO Design", Clifford E. Cummings, Sunburst Design, Inc.

https://twins.ee.nctu.edu.tw/courses/ip_core_04/resource_pdf...

Replies

waffletower • today at 5:25 PM

Regardless of correctness, as a DSP dork I really identified with the question: "What kind of a monster would make a non-power of two ring anyway?" I remember thinking similarly when requesting a power of two buffer from a 3rd party audio hardware device and having it correct to a nearby non-power of two. Latency adding ringbuffer to the rescue.

hinkley • yesterday at 10:57 PM

I think unfortunately we sometimes ascribe to powers of two supernatural powers that are really about caches being built in powers of two.

Intel is still 64 byte cache lines as they have been for quite a long time but they also do some shenanigans on the bus where they try to fetch two lines when you ask for one. So there’s ostensibly some benefit of aligning data particularly on linear scans to 128 byte alignment for cold cache access.

➕ show 2 replies

aidenn0 • yesterday at 10:33 PM

Non-power-of-two is only really feasible of the total number of inserts will fit in your post/ack counters. Otherwise you have to implement overflow manually which may or may not be possible to do with the available atomic primitives on your architecture.

I first encountered this structure at a summer internship at a company making data switches.

tom_ • today at 12:57 AM

A couple of the comments to the article suggest using 64-bit numbers, which is exactly the right solution. 2^64 nanoseconds=584.55 years - overflow is implausible for any realistic use case. Even pathological cases will struggle to induce wraparound at a human timescale.

(People will probably moan at the idea of restarting the process periodically rather than fixing the issue properly, but when the period would be something like 50 years I don't think it's actually a problem.)

➕ show 2 replies

azemetre • yesterday at 10:29 PM

Your link has an invalid cert FYI, but do appreciate the knowledge drop. Rung buffers are some of the cooler data structures out there.

➕ show 1 reply

zephen • today at 1:40 AM

> It is not just a way of writing ring buffers. It's a way of implementing concurrent non-blocking single-reader single-writer atomic ring buffers with only atomic load and store (and memory barriers).

That may or may not be part of the actual definition of a ring buffer, but every ring buffer I have written had those goals in mind.

And the first method mentioned in the article fully satisfies this, except for the one missing element mentioned by the author. Which in practice, often is not only not a problem, but simplifies the logic so much that you make up for it in code space.

Or, for example, say you have a 256 character buffer. You really, really want to make sure you don't waste that one character. So you increase the size of your indices. Now they are 16 bits each instead of 8 bits, so you've gained the ability to store 256 bytes by having 260 bytes of data, rather than 255 bytes by having 258 bytes of data.

Obviously, if you have a 64 byte buffer, there is no such tradeoff, and the third example wins (but, whether your are doing the first or third example, you still have to mask the index data off at some point, whether it's on an increment or a read).

> The author says that non-power-of-two is not possible, but I'm pretty sure it is if you use a conditional instead of integer modulus.

There's "not possible" and then "not practical."

Sure, you could have a 50 byte buffer, but now, if your indices are ever >= 50, you're subtracting 50 before accessing the array, so this will increase the code space (and execution time).

> The [index size > array size] technique is also widely known in FPGA/hardware circles

Right, but in those hardware circles, power-of-two _definitely_ matters. You allocate exactly one extra bit for your pointers, and you never bother manually masking them or taking a modulo or anything like that -- they simply roll over.

If you really, really need to construct something like a 6 entry FIFO in hardware, then you have techniques available to you that mere mortal programmers could not use efficiently at all. For example, you could construct a drop-through FIFO, where every element traverses every storage slot (with a concomitant increase in minimum latency to 6 clock cycles), or you could construct 4 bit indices that counted 0-1-2-3-4-5-8-9-10-11-12-13-0-1-2 etc.

Most ring buffers, hardware or software, are constructed as powers of two, and most ring buffers either (a) have so much storage that one more element wouldn't make any difference, or (b) have the ability to apply back pressure, so one more element wouldn't make any difference.

ErroneousBosh • today at 9:54 AM

> The author says that non-power-of-two is not possible, but I'm pretty sure it is if you use a conditional instead of integer modulus.

I don't see why it wouldn't be, it's just computationally expensive to take the modulo value of the pointer rather than just masking off the appropriate number of bits.

➕ show 1 reply

alt Hacker News

Replies