Hmm, I'm not convinced that is the direction we want to go in. It's not like we have all t...

stephschie • today at 12:34 PM • 4 replies • view on HN

Hmm, I'm not convinced that is the direction we want to go in. It's not like we have all the context of everything we ever learned present when making decisions. Heck, even for CPUs and GPUs we have strict hierachy of L1,L2,L3 shared, caches to larger memory units with constant management of those. Feel free to surprise me, but I believe having a similar stack for LLMs is the better way to go where we will have short-term memory (system-prompt, prompt, task), mid-term memory (session-knowledge, preferences), long-term memory (project knowledge, tech/stack insights), intuition memory (stemming from language, physics, rules). But right now we haven't developed best-practices yet of what information should go into what layer at what times. Increasing the overall context window is nice, but IMHO won't help us much.

Replies

itissid • today at 1:59 PM

But, in context learning could be better. One important thing here also is the ability to align on what to more/less pay attention to — no matter the Knowledge Base. These are the highest leverage points that need to be exposed to a human to think and reason over. Constrained/Guardrailed development tasks work fine*, But exploration new direction — vs exploiting local minimas — is still an achiles-heel, even with all these knowledge unless there is sufficient steering and exploration the minima-seeking "tries" hard to win.

* With Claude's 1-million context window I have been doing some slightly longer range tasks — ~1-3 days of work — with RPI/QRSPI frameworks(see last few days of comments else where on HN) in one context window. They involve a grill-me session with 20-60 sometimes more questions for tasks to get alignment which produces the design and the plan in one window.

➕ show 1 reply

user2722 • today at 1:16 PM

I have a simple and brittle system to track people and facts and associations on Newspapers, which is basically: "LLM extract people, places/projects/structure/places and save them as an Obsidian compatible graph network."

For 2 or 3 newspapers it works; my idea was to use it as grounding to discover relationships between people, companies and jobs.

As for the "everyone's life", I have always assumed that there would be a graph system to point to "forgotten" documents.

Gemini said my idea was amazing and new in its implementation, even if not in spirit, but I'm assuming it was being sycophantic as usual.

➕ show 1 reply

bhouston • today at 1:16 PM

> Hmm, I'm not convinced that is the direction we want to go in. It's not like we have all the context of everything we ever learned present when making decisions.

I do not think it is the direction for everything.

Generally, we need consolidation of experiences and memories to just remember the important conclusions, ideas, and concepts, and then the ability to remember the full details if they are relevant (which they usually are not.)

But for some applications I am sure a billion token context would be useful.

It is likely most people need a 10 core CPU or whatever for most tasks, but for some applications you want a supercomputer with 1M cores.

➕ show 1 reply

lumost • today at 1:13 PM

Currently, it is difficult to live update the model’s parameters in response to new information. This difficulty applies at both an infrastructural level and an optimization level.

We simply don’t know how to incorporate new information without losing old capabilities reliably. Pans handle this through extensive evaluation, heuristics, and experience.

What we do know is that models can adapt to their context, and extending the context window is an infrastructure and capex problem first. A billion useful tokens would obviate the need for any out of band memory structures.

➕ show 1 reply

alt Hacker News

Replies