logoalt Hacker News

gskmtoday at 7:56 AM1 replyview on HN

Spot on. That cliff might be less about the model failing at distance and more about noise accumulating faster than signal. In prod, most of what fills the window is file reads, grep output, and tool overhead, i.e., low-value tokens. By 700k you're not really testing long-context reasoning, you're testing the model's ability to find signal in a haystack it built itself.


Replies

myraktoday at 12:02 PM

[dead]