logoalt Hacker News

londons_exploretoday at 7:50 AM4 repliesview on HN

This sort of analysis is great.

Now why can't compilers do this sort of thing automatically?

Almost any problem seems to be possible to speed up 1000x in AVX512+days of thought compared to the naive version written in a python loop. If we could automate that whole process for big codebases the performance gains could be huge.


Replies

lifthrasiirtoday at 9:10 AM

Intents matter. Compilers can't see through your skull to infer your intents and thus behave very conservatively unless you override that behavior somehow. This inference, alas, also takes (much) time, so compilers have to balance the compilation time with quality of intents guessed as well. (This is why we can't exactly use LLMs in mainstream compilers, by the way.) So go and make a programming language that preserves your intents by every means; but making it practical would be very difficult.

cmovqtoday at 8:04 AM

Compilers can’t really, in a meaningful way, change the layout of your data in memory. And you do need to think about your memory layout to get any benefit from SIMD. You’ll notice a lot of compiler auto vectorization insert many instructions just to shuffle data around to get to a usable layout, which negates much of the benefit.

show 1 reply
flohofwoetoday at 9:06 AM

> 1000x in AVX512+days of thought compared to the naive version written in a python loop

Out of this 1000x speedup you get 100x by just not using python though ;)

Also IIRC the main problem specifically with AVX512 was that mainstream CPUs simply didn't have it, so a smart compiler won't be of much use when the output code only runs on a handful devices.

diamondlovesyoutoday at 7:59 AM

> Now why can't compilers do this sort of thing automatically?

They do - they just can't assume GFNI instructions are present unless you explicitly say so: https://godbolt.org/z/eYasbKsse