FWIW there's already a number of proposals for augmenting LLMs with long-term memory. And many of them show promising results.
So, perhaps, what's needed is not a discovery, but a way to identify optimal method.
Note that it's hard to come up with a long-term memory test which would be different from either a long-context test (i.e. LLM remembers something over a long distance) or RAG-like test.