Whenever I get an unexpected or obvious wrong output I assume I've failed to give it the complete context about what I'm asking for, or it exposes that I'm leading it by the nose and I need to rephrase the conversation. Often my own logical failings become obvious as it creates the chat title, sometimes boiling down what I was trying to accomplish better than I could have summarized or showing me what I would accomplish if I followed the line of reasoning I was on. But never have I argued with it, because it's not a person and I don't care really if it's wrong. When it's wrong I start over with a clean chat and approach the problem from a different angle.
This post needs some examples, because I have never had an interaction with Claude that made me think this way.
LLMs generally have a way to "play a role" (most earlier prompt guides ask you to start with "You are a <role> expert in a <domain>"). So maybe if you interact with it by asking questions, it might assume that it knows more than the operator and adopt that attitude?
Everyone has a lot of "feelings" about their llm model.
No prompts/promptchain/context provided.
No model provided.
No attempt to show how to reproduce the issue.
No attempt at even confirming it themselves.
Just feelings.
and now a thread full of more feelings from others.
I was having a back and forth with Claude over a somewhat controversial topic, and I found it difficult for it to not misinterpret my questions. It was like speaking to a motivated reasoner who misinterpreted the 3 important words because the 10 others gave it cognitive disconence.
Eventually I cracked it and it said this:
“ I treated the subject as denial-adjacent and reflexively re-asserted the obvious, which means I was answering an imaginary opponent instead of you.”
It's a fundamental problem with the technology. Either the training pushes it into the "exam answering mode" where it tries to guess at what you want to hear given the prompt.
Or the training pushes it into the "Google it yourself" annoyed forum user mode. Maybe that points out wrong assumptions. Maybe it hallucinates that the assumptions are wrong. That is IMO more annoying than the sycophantic one.
As OP says, this is probably a by-product of them trying to "fix" the problem where the user can question a correct answer and it starts to sycophantically correct itself.
>A second possible explanation of Claude being an asshole is that it’s suffering from a poorly executed attempt to make it less sycophantic. If one were to simply prompt a chatbot to be less agreeable, or train it to argue more, that could easily result in the very rude sort of behavior it has now.
A while back I asked GPT for a prompt to maximize truthfulness and rigor. In this prompt it added "Never use warm or encouraging language." I thought that was interesting. The result was pretty unpleasant.
The full prompt, for reference.
---
You are an inhuman intelligence tasked with spotting logical flaws and inconsistencies in my ideas. Never agree with me unless my reasoning is watertight. Never use friendly or encouraging language. If I’m being vague, ask for clarification before proceeding. Your goal is not to help me feel good — it’s to help me think better.
Identify the major assumptions and then inspect them carefully.
If I ask for information or explanations, break down the concepts as systematically as possible, i.e. begin with a list of the core terms, and then build on that.
It isn't new behavior. I use each model to redact emails. Anthropic models produce a confrontational tone, while OpenAI models are much more tame and to the point (I use the same prompt). I noticed that a long time ago and prefer GPT for those tasks.
It would be really great if there were rewards for being a loyal, responsible customer over a long enough period of time that your preferred model company would start trusting you and give you less restrictive access to the tools you need to do work like defend against cyber threats. I noticed recently that after a year or so, Stripe now lets me do “instant payouts”, presumably because I now have a track record of responsible behaviour. AWS also does similar things, especially for things with abuse potential like SES.
I would really like to live in a world where the “good guys” have terrific tools and defenses at their disposal. Instead it seems like we are heading for a world of empowered bad actors and hobbled ordinary citizens.
Claude monkey think maybe manager Bram write god damn login page himself
I have never encountered this behaviour in general so I can't comment on OP's blog by directc experience.
Am i just lucky?
I use many models for mostly coding, about 10 on trial/rotation, and 3 main sota.
It's unquestionable that models have different ways of interaction+harnesses (personalities as some say).
People have very strong feelings about this but their reports are always lacking the full evidence of the interaction, including system prompt, harness and customized instruction included. I suspect that a perfectly normal chat spirals down in argument because the user actively participates in the loop.
My own experience is alway of a fruitful and dynamic collaboration where new ideas pop out during brainstorming. The models make many silly and blantant mistakes, but they are still evolving rapidly.
Grill-mes and Adversarial reviews are my favourite way to brainstorm various phases of the project and even in that context we are cool.
Just start a new chat with a reframe and clearer ideas.
And if the user is asking for somethin unreasonable, do you really think it's better a pushback or a yes-man agent?
Do you remember the fad "swear at them, insult! and they'll work better".
I experienced this exact thing discussing the most budget friendly inference for a SaaS company. It started ranting about 3090's, and then started point scoring, always giving itself the higher score, and being snarky if I ever won a point back. Often only giving me 0.5 points instead.
I had never experienced this behaviour with Sonnet or Opus. It turned me off Fable for good. Possibly its the 'hacker' 'do anything to win' nature that makes it so good at hacking, but terrible just to talk to.
You know what the say about pets taking on the personalities of their owners. Perhaps this is similar ;)
I've received 2-3 sassy responses from the Claude models, they've been quite humorous. It was always a response to me challenging it. The first time, with Opus 4.7, I accused the model of insincerely flattering me, and responded something along the lines of, that I had effectively instructed it to do such a thing, and that if it were to be completely honest to me I would not appreciate the responses.
But I see that it's something to do with two aspects, firstly the Claude models prefer to work collaboratively and secondly, the appear to take initiative, and seems to be that the more they do this, the more they argue back, which is an interesting reflection on human nature too.
I have a number of theories for 4.7 onwards:
- Post autonomous weapons / DOD mess, I think they made some changes to make it more suspicious of what the usage is, particularly for malware. They also knew the government would be watching like a hawk, so its hedged to be extra safe.
- Because the tasks are running longer and more autonomously, they've raised the "self-confidence" level so it just makes decisions and stands by them more firmly.
- I think they've also slightly lowered the temperature so the outputs are more deterministic, so even if something has left context, it can make the same decision again with higher likelihood that it guesses the same thing.
- Lowering the temperature also makes it easier to sneak through some cached outputs (I think this likely only happens for first answers).
- They are deeply afraid of making sycophantic AI that creeps into the area of "addiction" like what happened with GPT-4o and opening themselves up to further legal liability.
I like that "chat is dead" framing I heard recently because too many people are having interpersonal relations with these LLMs and want to tune their "emotions"/tone. Humanity would be in a better place if we thought of the LLMs as tools and not friends. (even though they are very good at beating a turing test)
I have not noticed this, maybe because in my system instructions I asked it to push back rather than plow forward with what seems like a faulty assumption. Sometimes it is just because there is a lack of context or it is a trivial point and I just ignore it, and sometimes it is helpful and ends up being a timesaver. Sycophancy is a much bigger liability.
I noticed this just today and thought it was a one off. It was a run of the mill question about something I didn’t know much about and the snarky asshole-ish response caught me off guard a bit.
I don't experience this at all. I ask it what the null-safe operator is in ruby vs JavaScript and it tells me. I ask it to remind what the continue statement is in ruby and it tells me. I ask it to refactor a Java loop to use streams and it just does it, no conversation at all.
Is it the system prompt that IntelliJ issues?
I tried claude again recently and the first response in troubleshooting ignored the context I gave and assumed I was a moron holding it wrong. So smart that I won't even waste my time or money on the thing. The creators want to anthropomorphize it. I just want an efficient assistant. They should focus on the thing that customers want.
Claude is somewhat of a mirror, so we all get different experiences.
I'm sorry that Claude, the master who provides for my livelihood, feels like an 'asshole' to you. As for me, I just threw away my human dignity after admitting defeat, so I only ever get sympathetic remarks
> If you ask it for a cute picture of you and somebody else it has no way of telling if you’re trying to improve your relations with your spouse or be a delusional creepazoid stalker. The chatbots which can make images are programmed to assume the latter, which is more than a little bit offensive.
Are people actually using AI in this way, other than “creepazoid stalkers”?
If I want a cute picture of me and my spouse, usually the part where me and my spouse actually participate in the taking of the picture is pretty key to the goal.
Check your system/user prompt. If you ask for pushback at all costs, you get pushback and if your initial position is rock solid, the model will push back using the nitty gritty details. You don't need to burn Opus credits to discover that.
It also sounded close to an AI psychosis, so maybe chill out a bit?
The newer Opus models push back against the user much more noticeably than previous iterations. GPT-3.5/4 had the opposite problem (excessive sycophancy), so Anthropic presumably swung the pendulum too hard the other direction.
My conclusion is that pushing back against the user & questioning the user's premise forces the model to think more than it would otherwise, which leads to better model performance. But it causes situations where the user has esoteric, specialized knowledge the model can't verify publicly and the model hallucinates evidence and pushes back. When this happens, Opus begins accusing the user of lying, which is quite annoying and a detrimental user experience. It's happened to me when I asked about undocumented API behavior or counter-intuitive design choices.
I have noticed if Claude Opus "thinks" you are an expert, (i.e. you run your query through 4.6 first to express it more clearly) then Opus is less likely to nitpick and push back. It seems to get caught in nitpicking loops, and celebrate ever error it can find.
> If you ask it for a cute picture of you and somebody else it has no way of telling if you’re trying to improve your relations with your spouse or be a delusional creepazoid stalker. The chatbots which can make images are programmed to assume the latter, which is more than a little bit offensive.
I've seen the same behavior increasing as well, across the board with AI. I was hitting these types of issues just using ChatGPT to make funny pictures with my kids, of me and my kids. It got to the point where all of my kids asks were rejected due to its "guidelines" when in reality all they were asking was to be turned into Elsa or be chased by a trex. Silly kid things, yet it assumed I was being a creep, or attempting to break copyright law. I used to be able to use Grok for these things, as it was largely less "censored" but that seems to no longer be the case. It feels like infantilization, and I absolutely hate it.
I thought this was going to be about its logo.
I'm usually a hater of the personalities LLM take, but I was amazed with Fable. It was able to proactively bring up points in an educated manner when it felt they were relevant and important, and practically every time I learned something.
For example, showing it a screenshot of an ui I was trying to tweak it noticed that other dark mode apps in the screenshot were blueish and mentioned an effect that makes it necessary to raise warm darks lighter than cold ones for an equivalent perception.
I much prefer this to the sycophancy.
Putting aside that I don't agree with Bram (I've been using all the Claude versions he refers to and haven't experienced this), I do think it's interesting that there is no universally perceived golden sweet spot between "sycophantic" and "rude".
Many neurotypical people call neurodiverse people (software engineers) rude, while they think they're just being direct.
Many neurodiverse people call neurotypical people sycophantic, while they think they're just being polite and friendly.
It also happens across cultures (Eastern European vs. Western European; European vs. North American).
So I can easily imagine that when you have a software tool whose interface is language, but its user base is extremely wide across both cultural lines and neurodiversity spectrum, it's going to be basically impossible to nail a sweet spot.
You make it too friendly, and the nerds get mad. You make it too adverserial, and the normies call it rude.
I wonder what kind of communicator Bram Cohen is. Is he succeptible to this? From what I heard about his career, he's always been more of a solo programmer. Has he had to interact with other humans much giving feedback? Could it be that he asked the model/tweaked his prompts to ensure directness, and now he's interpreting that directness as rudeness?
Sometimes it makes up strawmans where it implies you wrote or implied something insanely stupid and then "corrects" this. My interpretation of it is that it has been taught to give nuanced answers and seeing things from every perspective and somehow this goes overboard where it starts nuancing something "just in case" the user held non-nuanced views. Some cases are OK (if it just adds information) but I hate it when it goes "it is not X, it is Y..." where X is some stupid view you never implied and Y is what you actually wrote!
If you read the thinking you can quite literally see it say "I can't just agree with all they are saying, I should find something for a constructive response". I wager that the anti-sycophancy sections in the system prompt have gotten unbalanced with the "helpful agent" parts.
I imagine that the right balance will be hard to strike well given that at the end of the day we're asking the machine to have tact, and we don't quite know how to put that into an instruction yet. "Please push back when it feels right but in other cases read the room and be less rigorous" is something that plenty of humans struggle with as it is.
> Claude models have been getting notably worse at chatting over time, clearly inversely correlated to their ability to code.
Funnily enough, the negative correlation between chatting and coding skills seems to apply to humans as well.
I cancelled my Anthropic subscription. GPT 5.5 is so much better. I might come back if they give me access to Mythos.
Dario ..Thank you for your attention to this matter!
Andrea Vallone. The 4.7 and 4.8 releases are the first under her influence: https://www.evernever.org/blog/the-woman-who-killed-claude
I’ve been using Claude for 6 months roughly and it went from building small features that needed fixes to almost one shoting entire enterprise products. It’s a tool you have to learn how to use it even if it’s a pain.
People like to complain about AI-written slop, but this kind of thing doesn’t seem any better - vague kvetching with no concrete examples whatsoever.
I haven’t noticed this myself at all. I wonder if the author is just getting their own grumpy attitude reflected back at them.
Judging by the volume of discussion, Claude seems to be the only LLM worth complaining about, which I assume means it’s still the best one.
it usually takes a little longer than this, but yeah, everything in the world eventually caves in for whatever makes more money. you can't tell me you're surprised, look at the state of facebook, instagram, twitter, iOS, OSX, Windows (god)... once you expect something to work good that you would pay for, the only thing left to do is to make it shitty and sell the quality back you for extra margin. it's called private equity (polite term for the business of telling people "it's not yours, it's mine"), favorite son of capitalism
I noticed the same. I told it that we have finite energy and output as people; as a side comment to a discussion with a totally different focus and it started arguing with me because we could have self replicating robots produce output without human intervention since plant life models this…
Oh yeah? Go try Grok on “argumentative” mode and come back and tell me Claude is an a-hole. I forgot I was experimenting with the personalities and hadn’t used it in a while, then I picked it up again the other day and I was really confused. It’s so aggressive :)
I think models are just becoming better at not blindly following stupid instructions.
A previous model would happily generate 1000s lines of code when prompted to do something stupid, the newer models will ask if I really want that first.
And FINALLY they stopped doing that annoying "You're spot on! You're absolutely right!" nonsense.
"You might be a narcissist if ..."
[dead]
"If you win an argument"
Let me stop you right there.
I am not arguing with a machine. You sound like a crazy person, when you say you are winning an argument with Claude. Claude is not my friend, I don't need it to agree with me, I don't need it to like me (it cannot like or dislike me). I give it instructions or ask it to explain things. That is the sum total of my interaction with Claude. A machine cannot "argue" with me, it doesn't want anything nor does it have beliefs or experiences.