logoalt Hacker News

killerstormyesterday at 8:34 PM1 replyview on HN

I meant caching on a bigger level. If you're an organization with 100 developers each doing 10 sessions a day, you're paying for 10000x tokens in frequently used document even if you had 100% KV cache hits within one session. Apparently that's too costly even for companies with trillion dollar market cap...

Normally KV cache works only if your context prefix is identical, but there are papers which demonstrate documents can be cached between different contexts.


Replies

brookstyesterday at 11:29 PM

Ah, understood, and thanks for the clarification!