It's actually Rust that is setting the 64 wide limit (see SupportedLaneCount), not LLVM.
I agree, f64x64 is probably a very bad idea.
But something like f32x8 would probably still be "fast enough" on old/mobile CPUs without 256 wide vectors (but good 128 bit SIMD ALU).
I did something like this when using a u16x16 bitmask fit the problem domain. Most of my target CPUs have 256 wide registers but on mobile ARM land they don't. This wasn't particularly performance sensitive code so I just used 256 bit wide vectors anyway. It wasn't worth it trying to optimize for the old CPUs separately.