logoalt Hacker News

daxfohl06/16/20251 replyview on HN

Partially related, is charging by token sustainable for LLM shops? If the compute requirements go up quadratically, doesn't that mean cost should as well?


Replies

sakras06/16/2025

Typically requests are binned by context length so that they can be batched together. So you might have a 10k bin and a 50k bin and a 500k bin, and then you drop context past 500k. So the costs are fixed per-bin.

show 1 reply