This is really semantics, but I wouldn't call attending to the KV cache re-reading the context....

in-silico • today at 3:19 AM • 0 replies • view on HN

This is really semantics, but I wouldn't call attending to the KV cache re-reading the context.

The model takes in the context, encodes it into a "memory" (the KV cache), and accesses that memory later. That fact doesn't change just because the KV cache grows in size with the context.

I don't know what memory would look like other than an encode-retrieve loop.

Relevant: Transformers are Multi-State RNNs - https://arxiv.org/abs/2401.06104

alt Hacker News