I think an even more likely explanation would be that x86 assembly programmers often were, or learned from other-architecture assembly programmers. Maybe there's a place where it makes more sense and it can be so attributed. 6502 and 68k being first places I would look at.
For 68k depending on the size you're interested in then it mostly doesn't matter.
.b and .w -> clr eor sub are all identical
for .l moveq #0 is the winner
6502 doesn't even have register-to-register ALU operations, there's no alternative to LDA #0.
8080/Z80 is probably where XOR A got a lead over SUB A, but they are also the same number of cycles.