You're looking at the wrong thing, WireGuard doesn't use AES, it uses ChaCha20. AES is really, really painful to implement securely in software only, and the result performs poorly. But ChaCha only uses addition rotation and XOR with 32 bit numbers and that makes it pretty performant even on fairly computationally limited devices.
For reference, I have an implementation of ChaCha20 running on the RP2350 at 100MBit/s on a single core at 150Mhz (910/64 = ~14.22 cycles per bytes). That's a lot for a cheap microcontroller costing around 1.5 bucks total. And that's not even taking into account using the other core the RP2350 has, or overclocking (runs fine at 300Mhz also at double the speed).
You’re totally right; I got myself spun around thinking AES instead of of ChaCha because the product I work on (ZeroTier) started with the initially and moved to AES later. I honestly just plain forgot that WireGuard hadn’t followed the same path.
An embarrassing slip, TBH. I’m gonna blame pre-holiday brain fog.