Compared to GPU programming the gains from SIMD are limited but it's a small-multiple boost and available pretty much everywhere. C# makes it easy to use through Vector classes. WASM SIMD still has a way to go but even with the current 128-bit you can see dramatic improvements in some buffer-processing cases (I did a little comparison demo here showing a 20x improvement in bitwise complement of a large buffer: https://www.jasonthorsness.com/2)
The high arithmetic bandwidth on GPUs is of course SIMD based as well. They just tend to have a ISPC style compilation model that doesn't expose the SIMD lanes in the source code. (Whereas SIMD even after decades is very lightly utilized by compilers on the CPU side).
The WASM folks should just include an arbitrary-length vector compute extension. We should also explore automatically compiling WASM to GPU compute as appropriate, the hardware independence makes it a rather natural fit for that.
I merged a few PRs to SIMD optimize Wasm WASI libc, but it all got stalled in str(c)spn (which is slightly more sophisticated than the rest).
There wasn't much appetite for any of it on Emscripten.
https://github.com/WebAssembly/wasi-libc/pulls?q=is%3Apr+opt...
> a small-multiple boost
Quick reminder that a 20x boost is better than going from O(n) to O(log n) for up to a million items. And, that log n algorithms often are simply not possible for many problems.