"On March 26, we shipped a change to clear Claude's older thinking from sessions that had ...

6keZbCECT2uB • yesterday at 6:30 PM • 5 replies • view on HN

"On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6"

This makes no sense to me. I often leave sessions idle for hours or days and use the capability to pick it back up with full context and power.

The default thinking level seems more forgivable, but the churn in system prompts is something I'll need to figure out how to intentionally choose a refresh cycle.

Replies

bcherny • yesterday at 7:02 PM

Hey, Boris from the Claude Code team here.

Normally, when you have a conversation with Claude Code, if your convo has N messages, then (N-1) messages hit prompt cache -- everything but the latest message.

The challenge is: when you let a session idle for >1 hour, when you come back to it and send a prompt, it will be a full cache miss, all N messages. We noticed that this corner case led to outsized token costs for users. In an extreme case, if you had 900k tokens in your context window, then idled for an hour, then sent a message, that would be >900k tokens written to cache all at once, which would eat up a significant % of your rate limits, especially for Pro users.

We tried a few different approaches to improve this UX:

1. Educating users on X/social

2. Adding an in-product tip to recommend running /clear when re-visiting old conversations (we shipped a few iterations of this)

3. Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.

Hope this is helpful. Happy to answer any questions if you have.

➕ show 36 replies

tadfisher • yesterday at 6:53 PM

It astounds me that a company valued in the hundreds-of-billions-of-dollars has written this. One of the following must be true:

1. They actually believed latency reduction was worth compromising output quality for sessions that have already been long idle. Moreover, they thought doing so was better than showing a loading indicator or some other means of communicating to the user that context is being loaded.

2. What I suspect actually happened: they wanted to cost-reduce idle sessions to the bare minimum, and "latency" is a convenient-enough excuse to pass muster in a blog post explaining a resulting bug.

➕ show 3 replies

sockaddr • yesterday at 8:33 PM

Yeah this is actually quite shocking. In my earlier uses of CC I might noodle on a problem for a while, come back and update the plan, go shower, think, give CC a new piece of advice, etc. Basically treating it like a coworker. And I thought that it was a static conversation (at least on the order of a day or so). An hour is absurd IMO and makes me want to rethink whether I want to keep my anthropic plan.

seizethecheese • yesterday at 6:40 PM

It's also a bit of a fishy explanation for purging tokens older than an hour. This happens to also be their cache limit. I doubt it is incidental that this change would also dramatically drop their cost.

➕ show 1 reply

zmmmmm • yesterday at 10:03 PM

Seems like it would interact very badly with the time based usage reset. If lots of people are hitting their limit and then letting the session idle until they can come back, this wouldn't be an exception. It would almost be the default behaviour.

alt Hacker News

Replies