Modern x86 CPUs have actual instructions for strcpy that work fairly well. There were several false starts along the way, but the performance is fine now.
The spec and some sanitizers use a scalar loop (because they need to avoid mistakenly detecting UB), but real world libc seem unlikely to use a scalar loop.
Modern x86 CPUs have actual instructions for strcpy that work fairly well. There were several false starts along the way, but the performance is fine now.