> [...] but I can only guess it’s because it completely breaks the context caching.
Yes, but you only re-do this every once in a while? It's a constant factor overhead. If you essentially feed the last few thousand tokens, you have no caching at all (and you are big enough that this window of 'last few thousand tokens' doesn't get you the whole conversation)?