What's strange is I'm finding that gcc really struggles to correctly optimize this. This...

cogman10 • today at 1:59 PM • 1 reply • view on HN

What's strange is I'm finding that gcc really struggles to correctly optimize this.

This was my function

    for (auto v : array) {
        if (v != 0)
            return false;
    }
    return true;

clang emits basically the same thing yours does. But gcc ends up just really struggling to vectorize for large numbers of array.

Here's gcc for 42 elements:

https://godbolt.org/z/sjz7xd8Gs

and here's clang for 42 elements:

https://godbolt.org/z/frvbhrnEK

Very bizarre. Clang pretty readily sees that it can use SIMD instructions and really optimizes this while GCC really struggles to want to use it. I've even seen strange output where GCC will emit SIMD instructions for the first loop and then falls back on regular x86 compares for the rest.

Edit: Actually, it looks like for large enough array sizes, it flips. At 256 elements, gcc ends up emitting simd instructions while clang does pure x86. So strange.

Replies

secondcoming • today at 3:54 PM

I;ve had to coerce gcc to emitting SIMD code by using int instead of bool. Also, the early return may be putting it off.

➕ show 1 reply

alt Hacker News

Replies