Not a single mention of prompt caching in this article, which is a massive benefit of append-only context.
Cost wise yes, but in terms of getting the correct best work done? Meh, not helpful!
I think more what's missing here is the comparison of different tries, from the same head. And there prompt caching does help!
If it were, I can in theory see situations where improving content cleanliness is worth blowing away the KV cache.
But I absolutely can't see how feeding the entire context into a more expensive model multiple times per task, just to propose context edits that might indirectly help, could ever be worthwhile.