Sure it's $24/hour, but it'll crank through tens of thousands of tokens per second --- those beefy GPUs are meant for large amounts of parallel workflow. You'll never _get_ that many tokens for a single request. That's why the mathematics work when you get dozens or hundreds of people using it.
No. The sauce is in KV caching: when to evict, when to keep, how to pre-empt an active agent loop vs someone who are showing signs of inactivity at their pc, etc.