I think LLVM should be able to handle up to less than 65536-bit vectors? At least I saw some llvm issue noting that that's where some things broke down; so a 32kbit vector would be f64x512. But I meant an exaggeration as far as actually using it goes, f64x64 is hilariously overkill even on AVX-512, and utterly awful on pre-AVX512.
It's actually Rust that is setting the 64 wide limit (see SupportedLaneCount), not LLVM.
I agree, f64x64 is probably a very bad idea.
But something like f32x8 would probably still be "fast enough" on old/mobile CPUs without 256 wide vectors (but good 128 bit SIMD ALU).
I did something like this when using a u16x16 bitmask fit the problem domain. Most of my target CPUs have 256 wide registers but on mobile ARM land they don't. This wasn't particularly performance sensitive code so I just used 256 bit wide vectors anyway. It wasn't worth it trying to optimize for the old CPUs separately.