Spot on. That cliff might be less about the model failing at distance and more about noise accumulat...

gskm • today at 7:56 AM • 1 reply • view on HN

Spot on. That cliff might be less about the model failing at distance and more about noise accumulating faster than signal. In prod, most of what fills the window is file reads, grep output, and tool overhead, i.e., low-value tokens. By 700k you're not really testing long-context reasoning, you're testing the model's ability to find signal in a haystack it built itself.

Replies

myrak • today at 12:02 PM

[dead]

alt Hacker News

Replies