logoalt Hacker News

dakollitoday at 2:56 AM1 replyview on HN

That's not possible, read my comment above. These are private companies, there are no public filings regarding their profitability in any sense. You're just making things up.

If you have a machine running at 150 tok/ps you can only make $5820 a month at $15 per 1mm running 24/7. It costs a hell of a lot more than 6k a month to run Claude 4.7 @ 150 tok/ps on that machine 24/7.

This math is a bit off, because you have input tokens too, but regardless its still not profitable especially for how long it takes to turn around a request and the caching is probably not all that profitable.


Replies

mtonetoday at 3:59 AM

You're forgetting a critical factor: concurrency. If a given hardware serves a single request at 150 tokens/s, it can also serve 20-30 requests at 100 tokens/s. Suddenly your $5K becomes $100K/month, enough to recoup the cost of the hardware in a year or so.

The reason it works: each time you read the model (memory bound) to calculate the next token, you can also update multiple requests (compute bound) while at it. It's also much more energy-efficient per token.

[1] https://aimultiple.com/gpu-benchmark

show 1 reply