logoalt Hacker News

spacecadet_today at 5:00 PM0 repliesview on HN

> Very few instructions even allowed interaction between the top and bottom 128 bits

That would be plain AVX, AVX2 has shuffles across the 128-bit boundary. To me that seems like the main hurdle for emulation with 128-bit vectors, in my experience compilers are very eager to emit shuffle instructions if allowed, and emulating a 256-bit shuffle with 128-bit operations would require 2 shuffles and a blend for each half of the emulated register.

EDIT: I just noticed that the benchmark in the article is pure math which probably wouldn't hit this particular issue, so this doesn't explain the performance difference...