Well to look at the last of that list. It added 134 - 3 lines to the project.
Of which, the actual change was
- __m256i mul_one;
- mul_one = _mm256_abs_epi8(_mm256_cmpeq_epi16(mul_one,mul_one)); // set all vector elements to 1
+ __m256i mul_one = _mm256_set1_epi8(1);
and the rest was testing that fix.