Even better / potentially more surprising:
unsigned mult(unsigned x, unsigned y) {
unsigned y0 = y;
while (x--) y = add_v1(y, y0);
return y;
}
optimizes to: mult(unsigned int, unsigned int):
madd w0, w1, w0, w1
ret
(and this produces the same result when substituting any of the `add_vN`s from TFA)