logoalt Hacker News

zozbot234yesterday at 8:01 PM0 repliesview on HN

The thing about context/KV cache is that you can swap it out efficiently, which you can't with the activations because they're rewritten for every token. It will slow down as context grows (decode is often compute-limited when context is large) but it will run.