It's a good call out re: tokens vs letters, but I think you might have misunderstood my point -...

reitzensteinm • yesterday at 11:28 PM • 1 reply • view on HN

It's a good call out re: tokens vs letters, but I think you might have misunderstood my point - you can't do it a token at a time unless the intermediate KV cache is stored after each token is generated.

This won't be the case in any non toy implementation, as it would be unneccessary and slow.

Replies

jgeralnik • today at 5:51 AM

Ah, fair enough. Anthropic caches at a block level (basically a single message) so for non-trivial messages this is really less of a concern, although I definitely understand why they still scope cache to a single tenant

alt Hacker News

Replies