logoalt Hacker News

brianwawoktoday at 2:32 PM1 replyview on HN

So many more efficiencies possible at scale though. I cannot keep a local model 98% utilized 24/7, at least not with my current workload. A big cloud can. I can’t power my servers with DC, I have this AC to DV conversion nonsense. The list goes on.


Replies

visargatoday at 2:59 PM

Besides fill factor being hard to match, there is also scaling - you can't scale local inference 10x for a spike, but you can with cloud inference.