the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.
More and more people I talk to care about privacy, but not in SF
More and more people I talk to care about privacy, but not in SF