logoalt Hacker News

Sharlinlast Saturday at 11:46 AM3 repliesview on HN

It’s such a ridiculous situation we’re in. Just about every consumer CPU of the past 20 years packs an extra order of magnitude or two of punch for data processing workloads, but to not let it go to waste you have to resort to writing your inner loops using low-level nonportable intrinsics that are just a step above assembly. Or pray that the gods of autovectorization are on your side.


Replies

jltsirenlast Saturday at 12:54 PM

Adding parallelism is much easier on the hardware side than the software side. We've kind of figured out the easy cases, such as independent tasks with limited interactions with each other, and made them practical for the average developer. But nobody understands the harder cases (such as SIMD) well enough to create useful abstractions that don't constrain hardware development too much.

show 1 reply
the__alchemistlast Saturday at 1:05 PM

Evidence point towards the autovectorization gods being dead, false, or too weak. I hear, but don't believe their prophets.

MangoToupelast Saturday at 11:47 AM

Well, yea. You need to describe your data flow in a way the CPU can take advantage of it. Compilers aren't magic.

show 2 replies