There's a chance this memory problem is not going to be that easy to solve. It's true cont...

IceHegel • today at 5:42 AM • 4 replies • view on HN

There's a chance this memory problem is not going to be that easy to solve. It's true context lengths have gotten much longer, but all context is not created equal.

There's like a significant loss of model sharpness as context goes over 100K. Sometimes earlier, sometimes later. Even using context windows to their maximum extent today, the models are not always especially nuanced over the long ctx. I compact after 100K tokens.

Replies

Ozzie_osman • today at 6:15 AM

But you don't have to hold the entire memory in context. You just need to perfect techniques to pull in parts of the context that you need. This can be done via RAG, multi-agent architectures, etc. It's not perfect but it will get better over time.

elorant • today at 6:47 AM

From my experience context window by itself tells half the story. You load a big document that’s 200k tokens and ask it a question, it will answer just fine. You start a conversation that soon enough balloons past 100k then it starts losing coherence pretty quickly. So I guess batch size plays a more significant role.

luckydata • today at 8:27 AM

I'm over simplifying here but graph database and knowledge graphs exist. An LLM doesn't need to preserve everything in context, just what it needs for that conversation.

spiderfarmer • today at 6:45 AM

Context will need to go in layers. Like when you tell someone what you do for a living, your first version will be very broad. But when they ask the right questions, you can dive into details pretty quick.

alt Hacker News

Replies