>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.
We don't know the parameters but it probably takes at least a H100 and possibly several to run a SOTA model. Given the pricing (25+k per H100 + hardware to run it) and power (700W per H100 + hardware to run it), I don't see how anyone except for a largish company can afford to run this.
No, it's economies of scale and I don't understand where anyone is coming from that thinks they'll be better off buying their own hardware, why would you get a better deal on MATMULs/watt than the cloud providers ?