I see I wasn’t clear enough. The tool I discussed generates multiple binaries and then packs all of them into a single binary. I was referring to the former.
https://github.com/ronnychevalier/cargo-multivers:
“After building the different versions, it computes a hash of each version and it filters out the duplicates (i.e., the compilations that gave the same binaries despite having different CPU features). Finally, it builds a runner that embeds one version compressed (the source) and the others as compressed binary patches to the source. For instance, when building for the target x86_64-pc-windows-msvc, by default 4 different versions will be built, filtered, compressed, and merged into a single portable binary.
When executed, the runner uncompresses and executes the version that matches the CPU features of the host.”
Hopefully (and likely) the patches will not be too large, but for 6 binary compiler flags, you’d still have 2⁶ binaries.
Yeah, but that's because of pragmatic choices to limit the scope of the tool. In the wider context of "I've long been surprised there isn't more multiversion stuff built right into every language compile" it's easy to imagine a compiler that can heuristically detect which functions would benefit from certain CPU features, and walk over the call graph to find locations for runtime feature detection that balance detection overhead with code duplication for the fallback functions. For example merging the feature detection of adjacent function calls, making sure feature detection is moved out of hot loops, etc.
Obviously this is much easier to imagine than to implement. And in some languages it might be made impossible by certain language features (function pointers might become tricky). But this is more or less what some people do by hand in Rust with the more manual is_x86_feature_detected macro, so there's no obvious reasons why compilers couldn't automate it in at least some languages.