logoalt Hacker News

kevindayyesterday at 9:28 PM1 replyview on HN

I think people started doing that after one of the Intel SSE examples did it and everyone just copied it.

But on any modern CPU there should be essentially no penalty for doing that now. Testing the full register is basically free as long as you aren't doing a partial write followed by a full read (write AH then read AX), and I don't think there's any case where this could stall on anything newer than a Core 2 era processor. But just replacing that with a "jnc" or whatever you're exactly trying to test for would be less instructions at least. I'd love to see benchmarks though if someone has dug deeper into this than I have.


Replies

anyfooyesterday at 11:24 PM

Unless instances are sparse, higher code density is of course always better, because of the instruction cache (and the microcode cache, if this doesn't get "pinhole optimized" away or something like that, I know nothing about the microcode cache).

But yeah, it may not make a real impact yet anyway.