logoalt Hacker News

eldenringtoday at 1:59 AM1 replyview on HN

the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.


Replies

cowpigtoday at 2:06 AM

More and more people I talk to care about privacy, but not in SF