Horrific comparison point. LLM inference is way more expensive locally for single users than running batch inference at scale in a datacenter on actual GPUs/TPUs.
How is that horrific? It sets an upper bound on the cost, which turns out to be not very high.
How is that horrific? It sets an upper bound on the cost, which turns out to be not very high.