The operation is slightly more complex yes, but has there ever been an x86 CPU where SUB or XOR takes more than a single CPU cycle?
I wonder if you could measure the difference in power consumption.
I mean, not for zeroing because we know from the TFA that it's special-cased anyway. But maybe if you test on different registers?
I wonder if you could measure the difference in power consumption.
I mean, not for zeroing because we know from the TFA that it's special-cased anyway. But maybe if you test on different registers?