logoalt Hacker News

2001zhaozhaotoday at 9:16 AM1 replyview on HN

Are you hosting your own infrastructure for coding agents? At least from first glance, sharing actual codebase context across compacts / multiple tasks seems pretty hard to pull off with good cost-benefit unless you have vertical integration from the inference all the way to the coding agent harness.

I'm saying this because the current external LLM providers like OpenAI tend to charge quite a bit for longer-term caching, plus the 0.1x cache read cost multiplied by # LLM calls, so I doubt context sharing would actually be that beneficial considering you won't need all the repeated context every time, so caching context results in longer context for each agentic task which might increase API costs by more overall than you save by caching.


Replies

eshaham78today at 6:09 PM

The 'scale' is more about automation volume - multiple daily sessions across platforms, recurring tasks, various automation scripts running continuously. Not infrastructure scale, but task scale. The cost optimization comes from designing tools that reduce token churn rather than from owning the inference layer.