It's not unsolved, at least not the first part of your question. In fact it is a feature offered by all main LLM providers!
- https://platform.openai.com/docs/guides/prompt-caching
- https://platform.claude.com/docs/en/build-with-claude/prompt...
dumb question, but is prompt caching available to Claude Code … ?
Ah, that's good to know, thanks.
But then why is there compounding token usage in the article's trivial solution? Is it just a matter of using the cache correctly?