logoalt Hacker News

mickeyptoday at 11:21 AM0 repliesview on HN

Sure it's $24/hour, but it'll crank through tens of thousands of tokens per second --- those beefy GPUs are meant for large amounts of parallel workflow. You'll never _get_ that many tokens for a single request. That's why the mathematics work when you get dozens or hundreds of people using it.

No. The sauce is in KV caching: when to evict, when to keep, how to pre-empt an active agent loop vs someone who are showing signs of inactivity at their pc, etc.