The middle approach is the only one that is not lock-free.
The first approach is lock-free, but as the author says, it wastes an element.
But here's the thing. If your element is a character, and your buffer size is, say, 256 bytes, and you are using 8-bit unsigned characters for indices, the one wasted byte is less than one percent of your buffer space, and also is compensated for by the simplicity and reduced code size.
I've used the "Waste an element" one for ages on microcontrollers where I don't want to deal with the overhead in an ISR.