Note that, not only are multiple consecutive increments reduced to zero latency, but that happens ev...

dzaima • 10/02/2024 • 1 reply • view on HN

Note that, not only are multiple consecutive increments reduced to zero latency, but that happens even if they're interleaved with movsxd, as in the second experiment at https://uops.info/html-lat/ADL-P/INC_R64-Measurements.html. It'd be interesting to see what other instructions it can "fuse" with (if that is what is happening).

Replies

rep_lodsb • 10/02/2024

Also interesting that this only happens with 64 bit registers: https://uops.info/html-lat/ADL-P/INC_R32-Measurements.html

I don't see a reason why this should be the case, since the high bits of the result would simply be cleared, and it's a common size optimization to use 32 bit operations.

Maybe https://news.ycombinator.com/item?id=41706743 is correct, and this is mainly intended for address increments generated by microcode?

➕ show 1 reply

alt Hacker News

Replies