So many more efficiencies possible at scale though. I cannot keep a local model 98% utilized 24/7, at least not with my current workload. A big cloud can. I can’t power my servers with DC, I have this AC to DV conversion nonsense. The list goes on.
Besides fill factor being hard to match, there is also scaling - you can't scale local inference 10x for a spike, but you can with cloud inference.
Besides fill factor being hard to match, there is also scaling - you can't scale local inference 10x for a spike, but you can with cloud inference.