Fine. What's canonical or most basic example of where SIMD should be applied, but isn't because it's too tricky to do so?
In our shop, we never look to vectorize any function or process unless it's called inside a loop many times.
> What's canonical or most basic example of where SIMD should be applied, but isn't because it's too tricky to do so?
There is none. That's a contradiction in terms. SIMD either fits the shape or it doesn't.
Text processing. Loading/branching/storing content 1 byte at a time is the CPUs worst nightmare, but most text processing is quite tricky in SIMD.