> The obvious answer is that XOR is faster.
It used to be not only faster but also smaller. And back then this mattered.
Say you had a computer running at 33 Mhz, you had 33 million cycles per second to do your stuff. A 60 Hz game? 33 million / 60 and suddenly you only have about 500 000 cycles per frame. 200 scanlines? Suddenly you're left with only 2500 cycles per scanline to do your stuff. And 2500 cycles really isn't that much.
So every cycle counted back then. We'd use the official doc and see how many cycles each instruction would take. And we'd then verify by code that this was correct too. And memory mattered too.
XOR was both faster and smaller (less bytes) then a MOV ..., 0.
Full stop.
And when those CPU first began having cache, the cache were really tiny at first: literally caching ridiculously low number of CPU instructions. We could actually count the size of the cache manually (for example by filling with a few NOP instructions then modifying them to, say, add one, and checking which result we got at the end).
XOR, due to being smaller, allowed to put more instructions in the cache too.
Now people may lament that it persisted way long after our x86 CPUs weren't even real x86 CPUs anymore and that is another topic.
But there's a reason XOR was used and people should deal with it.
We zero with XOR EAX,EAX and that's it.
The context was comparison to SUB EAX,EAX, not to a MOV.