> The marginal cost of an API call is small relative to what users pay, and utilization rates at ...

mbesto • yesterday at 4:34 PM • 1 reply • view on HN

> The marginal cost of an API call is small relative to what users pay, and utilization rates at scale are pretty high.

How do you know this?

> You don't need perfect certainty about GPU lifespan to see that the spread between cost-per-token and revenue-per-token leaves a lot of room.

You can't even speculate this spread without knowing even a rough idea of cost-per-token. Currently, it's total paper math on what the cost-per-token is.

> And datacenter GPUs have been running inference workloads for years now,

And inference resource intensity is a moving target. If a new model comes out that requires 2x the amount of resources now.

> They're not throwing away two-year-old chips.

Maybe, but they'll be replaced by either (a) a higher performance GPU that can deliver the same results with less energy, less physical density, and less cooling or (b) the extended support costs becomes financially untenable.

Replies

Leynos • today at 10:19 AM

If a model costs them 2x as much, they charge 2x as much. That much is clear from their API pricing.

alt Hacker News

Replies