> this looks optimized to me.
It's not. Why would lsl+csel or add+csel or cmp+csel ever be faster than a simple add? Or have higher throughput? Or require less energy? An integer addition is just about the lowest-latency operation you can do on mainstream CPUs, apart from register-renaming operations that never leave the front-end.
ARM is a big target, there could be cpus where lsl is 1 cycle and add is 2+.
Without knowing about specific compiler targets/settings this looks reasonable.
Dumb in the majority case? Absolutely, but smart on the lowest common denominator.