You have to use an instruction like cpuid with rdtsc so that the TSC is not read before the loop terminates. There have been changes to the Intel docs and there are more options now:
https://stackoverflow.com/a/58146426
Also in the bad old days SMM would interfere on some CPUs.