logoalt Hacker News

whimsicalism08/01/20250 repliesview on HN

would loading the KV cache from disk be faster than just recomputing it?

imo the discontinuous segments bit would not work because of the causal dependence in transformers + RoPE as you mention, but maybe could be possible