I bought a used 16 GB Intel A770 GPU for $200, and it's capable of running pretty powerful open stable diffusion and large language models.
Sure, I could get more performance out of proprietary models on much more expensive hardware, but there's diminishing returns, and consumer hardware and open models keep getting better.
I don't think the big investments into hosting models will pay off, especially as the base-line capabilities of integrate GPUs become enough to run a good model at home.
What kind of context size and speeds are you getting?