I'm saying that the shape of the SIMD is pretty much the same across platforms. Vector width differs between architectures, and whether the vector width is determined at compile time or at runtime differs between architectures, but you'll have to convince me that the vector width is such an essential component of the abstract description of the computation that you fundamentally can't abstract it away. (In fact, the success of RVV and ARM SVE should tell us that we can describe SIMD computation in a vector width-independent way.)
All vector instruction sets offer things like "multiply/add/subtract/divide the elements in two vector registers", so that is clearly not the part that's impossible to describe portably.
It's really not. As an example, for string processing tasks (including codecs which various server software spends a significant percentage of its runtime on), NEON includes a deinterleaving load into 4 registers and byte-wise shuffles that accept 2, 3, or 4 registers worth of lookup table. These primitives are quite different from those available on AVX2 or AVX-512, and the fact that they are available and cheap to use means you end up with somewhat different algorithms for the two types of targets. Even the practice of using the toys available in AVX2 well for this sort of task is somewhat obscure. Folks who have worked on codec-type stuff but primarily used AVX-512 often have trouble figuring out how to do most of the same things in similar instruction counts if masked versions of the instructions are not available.