Yep, good stuff. Another nice trick to extract more ILP is to split it into even/odd exponents and then recombine at the end (not sure if this has a name). This also makes it amenable to SLP vectorization (although I doubt the compiler will do this nicely on its own). For example something like
typedef double v2d __attribute__ ((vector_size (16)));
v2d packed = { x, x };
packed = fma(packed, As, Bs);
packed = fma(packed, Cs, Ds);
// ...
return x * packed[0] + packed[1]
smth like thatActually one project I was thinking of doing was creating SLP vectorized versions of libm functions. Since plenty of programs spend a lot of time in libm calling single inputs, but the implementation is usually a bunch of scalar instructions.