It is cheaper on ARM and POWER. But I'm not sure it is always safe. The standard has very complex rules for consume to make sure that the compiler didn't break the dependencies.
edit: and those rules where so complex that compilers decided where not implementable or not worth it.
The rules were there to explain what optimizations remained possible. Here no optimization is possible at the compiler level, and only the processor retains freedom because we know it won't use it.
It is nasty, but it's very similar to how Linux does it (volatile read + __asm__("") compiler barrier).