With KV caching as it’s described there it has to be a prefix match. OpenAI state in their docs they don’t cache anything below 1024 tokens long, and I’m sure I read somewhere that they only cache in 1024 token blocks (so 1024, 2048, 3072, etc) but I can’t find it now.
There’s been some research into how to cache chunks in the middle, but I don’t think any of the providers are doing it yet because it needs the prompt to be structured in a very specific way.
https://platform.openai.com/docs/guides/prompt-caching#requi...
> Caching is available for prompts containing 1024 tokens or more.
No mention of caching being in blocks of 1024 tokens thereafter.