Replacing just the mask operation is not enough.
The problem is incrementing past the index integer type limit.
Consider a simple example with ring buffer size 9, and 16bit indices:
When you increment the write index from 0xffff to 0, your "masked index" jumps from 6 (0xffff % 9) to 0 (instead of 7).
There is no elegant fix that I'm aware of (using a very wide index type, like possibly a uint64, is extremely non-elegant).
Yes, that's what I'm saying. You can't just use a quick and easy mask, you have to use a modulo operator which is computationally expensive enough that it's probably killing the time savings you made elsewhere.
There's probably no good reason to make your buffer sizes NOT a power of two, though. If memory's that tight, maybe look elsewhere first.