That comment is not very useful without pointing to realworld CPUs where SUB is more expensive than XOR ;)
E.g. on Z80 and 6502 both have the same cycle count.
Cortex A8 vsub reads the second source register a cycle earlier than veor, so that can add one cycle latency
Not scalar, but still sub vs xor. Though you’d use vmov immediate for zeroing anyway.
Harvard Mark I? Not sure why people think programming started with Z80.
The 6502 doesn't support XOR A or SUB A, and in fact doesn't have a SUB opcode at all, only SBC (subtract with carry, requiring an extra opcode to set the carry flag beforehand).