>despite not yet reaching bare-metal levels of performance and energy efficiency.
"Not yet"? It will never reach "bare-metal levels of performance and energy efficiency".
I'd have to take a contrary view on that. It'll take some time for the technologies to be developed, but ultimately managed JIT compilation has the potential to exceed native compiled speeds. It'll be a fun journey getting there though.
The initial order-of-magnitude jump in perf that JITs provided took us from the 5-2x overhead for managed runtimes down to some (1 + delta)x. That was driven by runtime type inference combined with a type-aware JIT compiler.
I expect that there's another significant, but smaller perf jump that we haven't really plumbed out - mostly to be gained from dynamic _value_ inference that's sensitive to _transient_ meta-stability in values flowing through the program.
Basically you can gather actual values flowing through code at runtime, look for patterns, and then inline / type-specialize those by deriving runtime types that are _tighter_ than the annotated types.
I think there's a reasonable amount of juice left in combining those techniques with partial specialization and JIT compilation, and that should get us over the hump from "slightly slower than native" to "slightly faster than native".
I get it's an outlier viewpoint though. Whenever I hear "managed jitcode will never be as fast as native", I interpret that as a friendly bet :)
Why? My only guess is that the instructions don't match x86 instructions well (way too few Wasm instructions) and the runtime doesn't have enough time to compile them to x86 instructions as well as, say, GCC could.
FWIW the native and WASM versions of my home computer emulators are within about 5% of each other (on an ARM Mac), e.g. more or less 'measuring noise':
https://floooh.github.io/tiny8bit/
You can squeeze out a bit more by building with -march=native, but then there's no reason that a WASM engine couldn't do the same.