It's actually Rust that is setting the 64 wide limit (see SupportedLaneCount), not LLVM. I ag...

exDM69 • last Saturday at 3:00 PM • 0 replies • view on HN

It's actually Rust that is setting the 64 wide limit (see SupportedLaneCount), not LLVM.

I agree, f64x64 is probably a very bad idea.

But something like f32x8 would probably still be "fast enough" on old/mobile CPUs without 256 wide vectors (but good 128 bit SIMD ALU).

I did something like this when using a u16x16 bitmask fit the problem domain. Most of my target CPUs have 256 wide registers but on mobile ARM land they don't. This wasn't particularly performance sensitive code so I just used 256 bit wide vectors anyway. It wasn't worth it trying to optimize for the old CPUs separately.

alt Hacker News