> Also, FTA: “and arguably the whole scheme should be replaced by finer-grained feature detection”. Such feature detection would lead to a combinatorial explosion of different binaries.
the thread is about runtime detection tbf
Ok, then it will be an explosion of binary size, if you have several code blocks optimized for each architecture level - I'm not very familiar with the subject, but I imagine it would have to be relatively large chunks of code, otherwise the constant branching would eat up the speed advantage.
I see I wasn’t clear enough. The tool I discussed generates multiple binaries and then packs all of them into a single binary. I was referring to the former.
https://github.com/ronnychevalier/cargo-multivers:
“After building the different versions, it computes a hash of each version and it filters out the duplicates (i.e., the compilations that gave the same binaries despite having different CPU features). Finally, it builds a runner that embeds one version compressed (the source) and the others as compressed binary patches to the source. For instance, when building for the target x86_64-pc-windows-msvc, by default 4 different versions will be built, filtered, compressed, and merged into a single portable binary.
When executed, the runner uncompresses and executes the version that matches the CPU features of the host.”
Hopefully (and likely) the patches will not be too large, but for 6 binary compiler flags, you’d still have 2⁶ binaries.