Also interesting that this only happens with 64 bit registers: https://uops.info/html-lat/ADL-P/INC_R32-Measurements.html
I don't see a reason why this should be the case, since the high bits of the result would simply be cleared, and it's a common size optimization to use 32 bit operations.
Maybe https://news.ycombinator.com/item?id=41706743 is correct, and this is mainly intended for address increments generated by microcode?
Interesting. I wonder how would interleaved 'inc r64'+'mov r32,r32' look - that's two separate latency-zero ops, equal to 'inc r32'. Wouldn't be too surprised if an eliminated op can only be zero-extending or incrementing, but not both.