Does a GPU doing inference server enough customers for long enough to bring in enough revenue to pay for a new replacement GPU in two years (and the power/running cost of the GPU + infrastructure). That's the question you need to be asking.
If the answer is not yes, then they are making money on inference. If the answer is no, the market is going to have a bad time.