A lot of that seems to be the usual "you're training them wrong".
Sonnet 3.5 is old hat, and today's Sonnet 4.6 ships with an extra long 1M context window. And performs better on long context tasks while at it.
There are also attempts to address long context attention performance on the architectural side - streaming, learned KV dropout, differential attention. All of which can allow LLMs to sustain longer sessions and leverage longer contexts better.
If we're comparing to wet meat, then the closest thing humans have to context is working memory. Which humans also get a limited amount of - but can use to do complex work by loading things in and out of it. Which LLMs can also be trained to do. Today's tools like file search and context compression are crude versions of that.
I know Sonnet 4.6 has a 1M context window. I use it every day. But in my experience with Claude Code and Cursor, performance clearly drops between 20k and 200k context. External memory is where the real fix is, not bigger windows.