You can fool the optimizer, but you have to work harder to do so:
unsigned add(unsigned x, unsigned y) {
unsigned a, b;
do {
a = x & y;
b = x ^ y;
x = a << 1;
y = b;
} while (a);
return b;
}
becomes (with armv8-a clang 21.1.0 -O3) : add(unsigned int, unsigned int):
.LBB0_1:
ands w8, w0, w1
eor w1, w0, w1
lsl w0, w8, #1
b.ne .LBB0_1
mov w0, w1
ret
Since I had to think about it:
It's easy to show that this algorithm is correct in the sense that, when b is returned, it must be equal to x+y. x+y summing to a constant is a loop invariant, and at termination x is 0 and y is b.It's a little more difficult to see that the loop will necessarily terminate.
New a values come from a bitwise & of x and y. New x values come from a left shift of a. This means that, if x ends in some number of zeroes, the next value of a will also end in at least that many zeroes, and the next value of x will end in an additional zero (because of the left shift). Eventually a will end in as many zeroes as there are bits in a, and the loop will terminate.