It should be kept in mind that a 4090 only buries a 9950X for FP32 computations.
For FP64 computations, the reverse happens, a 9950X buries a 4090, despite the latter having a 3-times higher price and a 2.5-times higher power consumption.
For FP64 operations, 4090 and 9950X are able to do a similar number of operations per clock cycle (288 vs. 256), but 9950X can do them at a double clock frequency and it is easier to reach a high fraction of the maximum theoretical throughput on a 9950X than on a 4090.
What about FP8? It is a target that is very popular for LLM inference.