As an end-user, I feel like they're kind of over-cooking and under-describing the features and behavior of what is a tool at the end of the day. Today the models are in a place where the context management, reasoning effort, etc. all needs to be very stable to work well.
The thing about session resumption changing the context of a session by truncating thinking is a surprise to me, I don't think that's even documented behavior anywhere?
It's interesting to look at how many bugs are filed on the various coding agent repos. Hard to say how many are real / unique, but quantities feel very high and not hard to run into real bugs rapidly as a user as you use various features and slash commands.
This reads like good news! They probably still lost a bunch of users due to the negative public sentiment and not responding quickly enough, but at least they addressed it with a good bit of transparency.
something i note from this is that this is not a model weights change, but it is a hidden state change anthropic is doing to the outputs that can tune the quality and down on the "model" without breaking the "we arent changing the model" promise.
how often do these changes happen?
Cool but I switched to Codex for the time being.
had this happen to me mid-refactor and spent 20 min wondering if I'd gone crazy. honestly the one hour threshold feels pretty arbitrary, sometimes you just step away to think
Hi Boris, random observer here. Would you consider apologizing to the community for mistakenly closing tickets related to this and then wrongly keeping them closed when, internally, you realized they were legitimate?
I think an apology for that incident would go a long way.
Good on Anthropic for giving an update & token refund, given the recent rumors of an inexplicable drop in quality. I applaud the transparency.
How about just not change the harness abruptly in the first place? Make new system prompt changes "experimental" first so you can gather feedback.
I had similar experience just before 4.5 and before 4.6 were released.
Somehow, three times makes me not feel confident on this response.
Also, if this is all true and correct, how the heck they validate quality before shipping anything?
Shipping Software without quality is pretty easy job even without AI. Just saying....
The issue making Claude just not do any work was infuriating to say the least. I already ran at medium thinking level so was never impacted, but having to constantly go "okay now do X like you said" was annoying.
Again goes back to the "intern" analogy people like to make.
So we weren't going mad then!
Reading the "Going forward" section I see that they have zero understanding of the main complaints.
Now we know why Anthropic banned the use of subscriptions with other agent harnesses: they partially rely on the Claude Code cli to control token usage through various settings.
And it also tells us why we shouldn’t use their harness anyway: they constantly fiddle with it in ways that can seriously impact outcomes without even a warning.
or you can use a non vibe designed efficient Rust TUI coding agent made by yours truly, all my coworkers use it too :) called https://maki.sh!
lua plugins WIP
Good on them for resolving all three issues, but is it any good again?
If you think that you can just silently modify the model without any announcements and only react when it doesn't go through unnoticed, then be 100% sure that your clients will check every possible alternative and will leave you as soon as they find anything similar in quality (and no, not a degraded one).
My takeaway is that they knew they were changing a bunch of stuff while their reps were gaslighting us in the comments here.
Why should we ever trust what they say again out trust that they won’t be rug-pulling again once this blows over?
The funny thing is, in the last 3 days Claude has gotten substantially worse. So this claim, "All three issues have now been resolved as of April 20 (v2.1.116)" does not land with me at all.
Effort should not be configurable for Opus, it should be set to a single default that provides the highest level of capability. There are zero instances in which I am willing to accept a lesser result in exchange for a slightly faster response from Opus. If that were the case I would be using Flash or Haiku.
Gaslit for months, only to acknowledge.
Please for the love of god just put the max price plan up like 4x or 5x in cost and make it actually work.
Interesting. All 3 seems like they’re obviously going to impact quality. e.g, reducing the effort from high to medium.
So then, there must have been an explicit internal guidance/policy that allowed this tradeoff to happen.
Did they fix just the bug or the deeper policy issue?
> On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode.
Translation: To reduce the load on our servers.
Boris gaslighted us with all the quality related incidents for weeks not acknowledging these problems.
They should do a similar report about their communication team. This was horrible mismanaged.
> On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6.
Is it just me or does this seem kind of shocking? Such a severe bug affecting millions of users with a non-trivial effect on the context window that should be readily evident to anyone looking at the analytics. Makes me wonder if this is the result of Anthropic's vibe-coding culture. No one's actually looking at the product, its code, or its outputs?
wow resetting everyone's usage meter is great. i was so close to finally hitting my weekly limit for once though
Too late bro, switched to Codex I’m done with your bullshit.
Corporate bs begins...
So it turns out Anthropic was gaslighting everyone on twitter about this then? Swearing that nothing had changed and people were imagining the models got worse?
I have noticed a clear increase in smarts with 4.7. What a great model!
People complain so much, and the conspiracy theories are tiring.
I genuinely don't understand what they have been trying to achieve. All of these incremental "improvements" have ... not improved anything, and have had the opposite effect.
My trust is gone. When day-to-day updates do nothing but cause hundreds of dollars in lost $$$ tokens and the response is "we ... sorta messed up but just a little bit here and there and it added up to a big mess up" bro get fuckin real.
> they were challenging to distinguish from normal variation in user feedback at first
translation: we ignored this and our various vibe coders were busy gaslighting everyone saying this could not be happening
ohh
Resuming from sessions are still broken since Feb (I had to get claude to write a hook to fix that itself), the monitoring tool doesn't work and blocks usage of what does (simple sleep - except it doesn't even block correctly so you just sidestep in more ridiculous ways), and yet there seems to be more annoying activity proxies/spinner wheels (staring into middle distance)... Like I don't know how in a span of a few months you lose such focus on your product goals. Has Anthropic reached that point in their lifecycle already where their product team is no longer staffed by engineers and they have more and more non-technical MBAs joining trying to ride the hype train?
Honestly, it’s kind of sad that Anthropic is winning this AI race. They are the most anti–open source company, and we should try to avoid them as much as possible.
They are all doing it because OpenAI is snatching their customers. And their employees have been gaslighting people [1] for ages. I hope open-source models will provide fierce competition so we do not have to rely on an Anthropic monopoly. [1] https://www.reddit.com/r/claude/comments/1satc4f/the_biggest...
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
It’s incredible how forgiving you guys are with Anthropic and their errors. Especially considering you pay high price for their service and receive lower quality than expected.