Back in 2005 or 2006, I was working at a little startup with "DVD Jon" Johansen and we'd have Quake 3 tournaments to break up the monotony of reverse-engineering and juggling storage infrastructure. His name was always "xor eax,eax" and I always just had to laugh at the idea of getting zeroed out by someone with that name. (Which happened a lot -- I was good, but he was much better!)
> Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free.
I’m familiar with 32-bit x86 assembly from writing it 10-20 years ago. So I was aware of the benefit of xor in general, but the above quote was new to me.
I don’t have any experience with 64-bit assembly - is there a guide anywhere that teaches 64-bit specifics like the above? Something like “x64 for those who know x86”?
I remember a lot of code zeroing registrers, dating at least back from the IBM PC XT days (before the 80286).
If you decode the instruction, it makes sense to use XOR:
- mov ax, 0 - needs 4 bytes (66 b8 00 00) - xor ax,ax - needs 3 bytes (66 31 c0)
This extra byte in a machine with less than 1 Megabyte of memory did id matter.
In 386 processors it was also - mov eax,0 - needs 5 bytes (b8 00 00 00 00) - xor eax,eax - needs 2 bytes (31 c0)
Here Intel made the decision to use only 2 bytes. I bet this helps both the instruction decoder and (of course) saves more memory than the old 8086 instruction.
In modern CPUs, a lot of these are recognized as zeroing idioms and they end up doing the same thing (often a register renaming trick). Using the shortest one makes sense. If you use a really weird zeroing pattern, you can also see it as a backend uop while many of these zeroing idioms are elided by the frontend on some cores.
> In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.
Meanwhile, people like me who got started with a Z80 instead immediately knew why, since XOR A is the smallest and fastest way to clear the accumulator and flag register. Funny how that also shows how specific this is to a particular CPU lineage or its offshoots.
It's funny how machine code is a high level language nowadays, for this example the CPU recognizes the zeroing pattern and does something quite a bit different.
> By using a slightly more obscure instruction, we save three bytes every time we need to set a register to zero
Meanwhile, most "apps" we get nowadays contain half of npmjs neatly bundled in electron. I miss the days when default was native and devs had constraints to how big their output could be.
It happens to be the first instruction of the first snippet in the wonderful xchg rax,rax.
I'd like to learn about the earliest pronunciations of these instructions. Only because watching a video earlier, I heard "MOV" pronounced "MAUV" not "MOVE"
Not sure exactly how I could dig up pronunciations, except finding the oldest recordings
Matt Godbolt also uploads to his self titled Youtube channel: https://www.youtube.com/watch?v=eLjZ48gqbyg
> Interestingly, when zeroing the “extended” numbered registers (like r8), GCC still uses the d (double width, ie 32-bit) variant.
Of course. I might have some data stored in the higher dword of that register.
Because "sub eax,eax" looks stupid. (and also clears the carry flag, unlike "xor eax, eax")
similarly IIRC, on (some generations of) x86 chips, NOP is sugar around `XCHG EAX, EAX` which is effectively a do-nothing operation
The actually surprising part to me is that such an important instruction uses a two byte encoding instead of one byte :)
Also cool this got at the top item on the HN front page
I've wrote a lot of `xor al,al` in my youth.
Back on the Z80 'xor a' is the shortest sequence to zero A
Back when I did IBM 370 BAL Assembly Language, we did the same thing to clear a register to zero.
XR 15,15 XOR REGISTER 15 WITH REGISTER 15
vs L 15,=F'0' LOAD REGISTER 15 WITH 0
This was alleged to be faster on the 370 because because XR operated entirely within the CPU registers, and L (Load) fetched data from memory (i.e.., the constant came from program memory).My brain read this is "Why not ear wax?"
Because mov eax, 0 requires fetching a constant and prolongs instruction fetching/execution. XOR A was a trick I learned back in the Z80 days.
Remnant of RISC attempt without a zero register.
[dead]
The page crashes after 3 seconds, 100% of the time, on the latest version of Android Chrome and works fine on Brave, fyi.
In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.
Yeah, sadly the 6502 didn't allow you to do EOR A; while the Z80 did allow XOR A. If I remember correctly XOR A was AF and LD A, 0 was 3E 01[1]. So saved a whole byte! And I think the XOR was 3 clock cycles fast than the LD. So less space taken up by the instruction and faster.
I have a very distinct memory in my first job (writing x86 assembly) of the CEO walking up behind my desk and pointing out that I'd done MOV AX, 0 when I could have done XOR AX, AX.
[1] 3E 00