logoalt Hacker News

camel-cdrlast Saturday at 10:54 AM2 repliesview on HN

Here is an example using google highway: https://godbolt.org/z/Y8vsonTb8

See how the code has only been written once, but multiple versions of the same functions where generated targeting different hardware features (e.g. SSE, AVX, AVX512). Then `HWY_DYNAMIC_DISPATCH` can be used to dynamically call the fastest one matching your CPU at runtime.


Replies

William_BBlast Saturday at 11:39 AM

Thank you so much, this explains it well. I was initially afraid that the dispatch would be costly, but from what I understand it's (almost) zero cost after the first call.

I only code for x86 with vectorclass library, so I never had to worry about portability. In practice, is it really possible to write generic SIMD code like the example using Highway? Or could you often find optimization opportunities if you targeted a particular architecture?

show 1 reply
jeffreygoestolast Saturday at 12:58 PM

Nice. First time that I saw this dynamic dispatch was in FFTW.