logoalt Hacker News

janwasyesterday at 2:23 PM1 replyview on HN

I made the same argument a while ago but a coworker changed my mind.

Can you afford to write and maintain a codepath per ISA (knowing that more keep coming, including RVV, LASX and HVX), to squeeze out the last X%? Is there no higher-impact use of developer time? If so, great.

If not, what's the alternative - scalar code? I'd think decent portable SIMD code is still better than nothing, and nothing (scalar) is all we have for new ISAs which have not yet been hand-optimized. So it seems we should anyway have a generic SIMD path, in addition to any hand-optimized specializations.

BTW, Highway indeed provides decent emulations of LD2..4, and at least 2-table lookups. Note that some Arm uarchs are anyway slow with 3 and 4.


Replies

anonymoushnyesterday at 4:07 PM

For now, at work, it's just some parts with AVX-512, some parts with AVX-512 that we can't really use, so we should use AVX2, and some parts with NEON and SVE. So the implementations for SSE basically are a courtesy to outside users of the libraries, and there are no RVV implementations.

If we were already depending on highway or eve, I would think it's great to ship the generic SIMD version instead of the SSE version, which probably compiles down to the same thing on the relevant targets. This way, if future maintainers need to make changes and don't want to deal with the several implementations I have left behind, the presence of the generic implementation would allow them to delete them rather than making the same changes a bunch of times.

show 1 reply