Any new hardware lags in compiler optimizations.
i. llvm presentation can thrash caches if setup wrong (given the plethora of RISC-V fragmented versions, most compilers won't cover every vanity silicon.)
ii. gcc is also "slow" in general, but is predictable/reliable
iii. emulation is always slower than kvm in qemu
It may seem silly, but I'd try a gcc build with -O0 flag, and a toy unit test with -S to see if the ASM is actually foobar. One may have to force the -mtune=boom flag to narrow your search. Best regards =3