I don't think clang is being "aggressive" on ARM, it's just that all aarch64 targets support fma. You'll get similar results with vfmadd213ss on x86-64 with -march=haswell (13 years old at this point, probably a safe bet).
float fma(float x) {
return 3.0f * x + 1.0f;
}
Clang armv8 21.1.0: fma(float):
sub sp, sp, #16
str s0, [sp, #12]
ldr s1, [sp, #12]
fmov s2, #1.00000000
fmov s0, #3.00000000
fmadd s0, s0, s1, s2
add sp, sp, #16
ret
Clang x86-64 21.1.0: .LCPI0_0:
.long 0x3f800000
.LCPI0_1:
.long 0x40400000
fma(float):
push rbp
mov rbp, rsp
vmovss dword ptr [rbp - 4], xmm0
vmovss xmm1, dword ptr [rbp - 4]
vmovss xmm2, dword ptr [rip + .LCPI0_0]
vmovss xmm0, dword ptr [rip + .LCPI0_1]
vfmadd213ss xmm0, xmm1, xmm2
pop rbp
ret
The point is that there are multiple, meaningfully different implementations for the same line, not that either is wrong. Sometimes compilers will even produce both implementations and call one or the other based on runtime checks, as this ICC example does:
https://godbolt.org/z/KnErdebM5