The author only compared output token costs -- but for typical agentic workloads, input tokens domin...

maho • today at 12:40 PM • 1 reply • view on HN

The author only compared output token costs -- but for typical agentic workloads, input tokens dominate the costs by a large margin. Running inference locally, input tokens are, to first order, free. (They only generate implicit costs through higher time-to-first-token, higher power use, and lower token output speed).

Replies

Wilya • today at 2:26 PM

Yeah, that completely invalidates his point.

I looked at a couple random agentic sessions in my openrouter activity, and the input cost is 10x the output cost.

Prompt caching on openrouter is complicated and unreliable. On local hardware with llama-cpp, it's mostly free.

alt Hacker News

Replies