logoalt Hacker News

charcircuityesterday at 11:44 PM1 replyview on HN

You are making the false assumption that all token consumption costs the same when it doesn't. Yes in the limit the price to serve the model and generate a response is O(tokens), but when tokens is smaller it can be cheaper to generate a new token than when tokens is bigger. If other harnesses prompt with more tokens than Claude Code it can be more expensive to serve.


Replies

stavrosyesterday at 11:47 PM

They have limits. I don't care how expensive it is to serve, I'm paying them for a given amount of tokens (a limit which THEY SET) and they want to also dictate where I spend those tokens.

show 2 replies