Thanks for the post and the explanation.
I really enjoyed this relevant article about prompt caching where the author explained some of the same principles and used some additional visuals, though the main point there was why KV cache hits makes your LLM API usage much cheaper: https://ngrok.com/blog/prompt-caching/