I'm watching a conference talk right now from 2 weeks ago: "I Hated Every Coding Agent So I Built My Own - Mario Zechner (Pi)", and in the middle he directly references this.
He demonstrates in the code that OpenCode aggressively trims context, by compacting on every turn, and pruning all tool calls from the context that occurred more than 40,000 tokens ago. Seems like it could be a good strategy to squeeze more out of the context window - but by editing the oldest context, it breaks the prompt cache for the entire conversation. There is effectively no caching happening at all.
I'm watching a conference talk right now from 2 weeks ago: "I Hated Every Coding Agent So I Built My Own - Mario Zechner (Pi)", and in the middle he directly references this.
He demonstrates in the code that OpenCode aggressively trims context, by compacting on every turn, and pruning all tool calls from the context that occurred more than 40,000 tokens ago. Seems like it could be a good strategy to squeeze more out of the context window - but by editing the oldest context, it breaks the prompt cache for the entire conversation. There is effectively no caching happening at all.
https://youtu.be/Dli5slNaJu0