I imagine FPGA could just be part of general CPU that provides user space APIs to program them to accelerate certain work flow, in other words, this sounds like exactly JIT to me. People may program FPGA as they need to, e.g. AV1 encoder/decoder, accelerate some NN layers, or even a JS runtime, am I thinking something too wild for hardware capability or is it just the ecosystem isn't there yet to allow such flexible use cases?
Digital logic design isn't software programming, and today's FPGAs are for most intents and purposes 'single-configuration-at-a-time' devices - you can't realistically time-slice them.
The placement and routing flow of these devices is an NP-Complete problem and is relatively non-deterministic* (the exact same HDL will typically produce identical results, but even slightly different HDL can produce radically different results.)
All of these use cases you've mentioned (AV1 decoders, NN layers, but especially a JS runtime) require phenomenal amounts of physical die area, even on modern processes. CPUs will run circles around the practical die area you can afford to spare - at massively higher clock speeds - for all but the most niche of problems.
My rule of thumb is a 40x silicon area ratio between FPGA and ASIC, a clock speed that is around 5x lower. And a lot more power consumption.
If you have an application that can be done on a CPU, with lots of sequences dependencies (such as video compression/decompression), an FPGA doesn’t stand a chance compared to adding dedicated silicon area.
That’s even more so if you’d embed an FPGA on a CPU die. Intel tried it and you got a power hungry jack of all trades, master of none that nobody knew what to do with.
Xilinx MPSOC and RFSOC are successful, but their CPUs are comparatively lower performance and used as application specific orchestrators and never as a generic CPU that run traditional desktop or server software.