logoalt Hacker News

zozbot234yesterday at 7:21 PM0 repliesview on HN

Grug says prompt caching just store KV-cache which is sequenced by token. Easy cut it back to just before edit. Then regenerate after is just like prefill but tiny.