logoalt Hacker News

soerxpsolast Sunday at 7:36 PM1 replyview on HN

I believe you're misunderstanding what the OP means about "long-term" memory. From what I can tell, it's not actively modifying the weights of the underlying model, it just "remembers" things from a high number of tokens into the past of its context. The point is that this allows it to remember something it read ~200 pages ago in a very long context window, not that it can remember something from one session into another clean session.


Replies

AlexCoventrylast Monday at 12:09 AM

This model has fast weights, which actually are modified during inference.

show 1 reply