logoalt Hacker News

woadwarrior01yesterday at 3:12 PM1 replyview on HN

It'd be terribly compute inefficient to not share prefix caches (KV cache) across customers.


Replies

aceplyesterday at 3:21 PM

What is the probability that two customers will have exactly the same tokens in cache? Wouldnt it require using the exact same CLAUDE.md, skills, MCPs and context? After that it is even worse since the nondeterminism of LLMs and humans

show 3 replies