Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?
Personally, even though performance up to 200k has improved a lot with 4.5 and 4.6, I still try to avoid getting up there — like I said in another comment, when I see context getting up to even 100k, I start making sure I have enough written to disk to type /new, pipe it the diff so far, and just say “keep going.” I feel like the dropoff starts around maybe 150k, but I could be completely wrong. I thought it was funny that the graph in the post starts at 256k, which convenient avoids showing the dropoff I'm talking about (if it's real).
I mentioned this at work but context still rots at the same rate. 90k tokens consumed has just as bad results in 100k context window or 1M.
Personally, I’m on a 6M+ line codebase and had no problems with the old window. I’m not sending it blindly into the codebase though like I do for small projects. Good prompts are necessary at scale.
The benchmark charts provided are the writeup. Everything else is just anecdata.
Isn't transformer attention quadratic in complexity in terms of context size? In order to achieve 1M token context I think these models have to be employing a lot of shortcuts.
I'm not an expert but maybe this explains context rot.
> As you got closer to 100k performance degraded substantially
In practice, I haven't found this to be the case at all with Claude Code using Opus 4.6. So maybe it's another one of those things that used to be true, and now we all expect it to be true.
And of course when we expect something, we'll find it, so any mistakes at 150k context use get attributed to the context, while the same mistake at 50k gets attributed to the model.