One thing compilers still struggle with is exploiting weird microarchitectural quirks or timing behaviors that aren't obvious from the ISA spec, especially with memory, cache and pipeline tuning. If a new RISC-V core doesn't expose the same prefetching tricks or has odd branch prediction you won't get parity just by porting the same backend. If you want peak numbers sometimes you do still need to tune libraries or even sprinkle in a bit of inline asm despite all the "let the compiler handle it" dogma.
While true, it's typically not going to be impactful on system performance.
There's a reason, for example, why the linux distros all target a generic x86 architecture rather than a specific architecture.
The things you are talking about are taken care of by out of order execution and the CPU itself being smart about how it executes. Putting in prefetch instructions rarely beats the actual prefetcher itself. Compilers didn't end up generating perfect pentium asm either. OOO execution is what changed the game in not needing perfect compiler output any more.