> first class SIMD in languages
People have said this for longer than I've been alive. I don't think it's a meaningful concept.
Here is how I would have it:
Instead of writing:
fn do_athing(val: f32x69) -> f32x69 {
}
Instead you do: fn do_athing(val: fx) -> fx {
}
And it just works. f32, f64, as wide as your CPU supports.Here we are discussing the merits of built in SIMD facilities of not one but two programming languages. Waiting for the Zig guys to chime in to make it a three.
Four if you include LLVM IR (I don't).
No reason to be dismissive about it.
It’s such a ridiculous situation we’re in. Just about every consumer CPU of the past 20 years packs an extra order of magnitude or two of punch for data processing workloads, but to not let it go to waste you have to resort to writing your inner loops using low-level nonportable intrinsics that are just a step above assembly. Or pray that the gods of autovectorization are on your side.