Oh hey, I wrote a Stackblur implementation in Rust. The trick I used is to SIMD across multiple rows/columns of the image rather than trying to SIMD the algorithm itself.
https://github.com/logandark/stackblur-iter