logoalt Hacker News

mbestoyesterday at 4:34 PM1 replyview on HN

> The marginal cost of an API call is small relative to what users pay, and utilization rates at scale are pretty high.

How do you know this?

> You don't need perfect certainty about GPU lifespan to see that the spread between cost-per-token and revenue-per-token leaves a lot of room.

You can't even speculate this spread without knowing even a rough idea of cost-per-token. Currently, it's total paper math on what the cost-per-token is.

> And datacenter GPUs have been running inference workloads for years now,

And inference resource intensity is a moving target. If a new model comes out that requires 2x the amount of resources now.

> They're not throwing away two-year-old chips.

Maybe, but they'll be replaced by either (a) a higher performance GPU that can deliver the same results with less energy, less physical density, and less cooling or (b) the extended support costs becomes financially untenable.


Replies

Leynostoday at 10:19 AM

If a model costs them 2x as much, they charge 2x as much. That much is clear from their API pricing.