Any chance those AVX-512 optimizations they released a while ago work within this? [1]
Note those only apply to scene_sad which is used for scene change detection and freeze detection and a few other things like mpdecimate -- it's a very specific use case
I think WASM SIMD is only 128-bit wide.