Most vaguely recent compilers will convert naively looping through bits into a native POPCOUNT instruction. The parallel bit count algorithm was not reliably detected until more recently and therefore would sometimes produce unoptimized code, though current versions of gcc/clang/msvc can all detect it now.
Also, pretty much every compiler for a very long time has supported __builtin_popcount or equivalent.