Yes, that's what I'm saying. You can't just use a quick and easy mask, you have to use a modulo operator which is computationally expensive enough that it's probably killing the time savings you made elsewhere.
There's probably no good reason to make your buffer sizes NOT a power of two, though. If memory's that tight, maybe look elsewhere first.
What I mean is: This ringbuffer implementation (and its simplicity) relies on the index range being a multiple of the buffer size (which is only true for powers of two, when the index is e.g. a 32bit unsigned integer).
If you swap bitmasking for modulo operations then that does work at first glance, but breaks down when the index wraps around. This forces you to abandon the simple "increment" operation for something more complex, too.
The requirement for a power-of-two size is more intrinsic to the approach than just the bitmasking operation itself.