logoalt Hacker News

sosodevtoday at 5:40 PM1 replyview on HN

My understanding, from listening/reading what top researchers are saying, is that model architectures in the near future are going to attempt to scale the context window dramatically. There's a generalized belief that in-context learning is quite powerful and that scaling the window might yield massive benefits for continual learning.

It doesn't seem that hard because recent open weight models have shown that the memory cost of the context window can be dramatically reduced via hybrid attention architectures. Qwen3-next, Qwen3.5, and Nemotron 3 Nano are all great examples. Nemotron 3 Nano can be run with a million token context window on consumer hardware.


Replies

mccoybtoday at 6:42 PM

I don't disagree with this, but I don't think the memory cost is the only issue right? I remember using Sonnet 4.5 (or 4, I can't remember the first of Anthropic's offerings with a million context) and how slow the model would get, how much it wanted to end the session early as tokens accrued (this latter point, of course, is just an artifact of bad training).

Less worried about memory, more worried about compute speed? Are they obviously related and is it straightforward to see?

show 2 replies