The User Is Visibly Frustrated

172 points • by croes • today at 4:39 AM • 150 comments • view on HN

Comments

My take on the issue is that for most use cases where AI is pushed to the general public, a conversational chatbot is not the right tool, and the experience is bound to be frustrating.

Remember when Copilot was basically a super-smart version of Intellisense? It was awesome. Sure, there was a lot of pushback and concern, mainly about licensing and ethical issues, none of which are solved with the current chatbot model. But now I also have to come up with a prompt and type it out. How is that an improvement over having the LLM use surrounding code as context and figure out how to fill in the blanks? A well integrated tool beats a bolted-on chatbot any time for me. Another example would be translation: in Firefox, I can right click any text or click the 文/A button, and I can translate the text or the whole page from basically any language to any other. The frontier LLM's solution is to prompt their chatbot to do the task, which is a downgrade. Sure, I could also ask Claude to write a poem, but when I need to translate a webpage, it doesn't help much.

I get why all major AI companies push towards this solution, because they can build a single tool and sell it to everyone, and that training their models is very expensive and they can't afford to alienate any part of the potential market. But ultimately they're building Swiss army knives, which are able to do basically anything, but will never be able to allow users to tighten a screw better than a well designed screwdriver. Sure, I won't ever be able to clip my nails with a screwdriver, but if my business is tightening screws, I won't tolerate using a Swiss army knife for long.

Please build actual tools. Not textboxes for me to try and configure a non-deterministic tool. Then frustration will go down.

➕ show 1 reply

RandomBK • today at 7:01 AM

I've found swearing at a model to be quite effective in getting it to rethink and correct its mistakes. This seems to apply across Codex, Claude, Qwen, and Gemma/Gemini.

I don't know if the model is picking up on a "need to lock in and be more rigorous" signal, or if the model providers are routing to smarter models if they detect a frustrated user. But if a model keeps making the same mistakes, swearing at it often helped kick it out of a glut and onto the right track.

Or it could just be catharsis.

➕ show 11 replies

jofzar • today at 8:32 AM

Interestingly to me, the problem I always find is that you will make a suggestion, the AI will go through a thinking loop, come to the exact wrong conclusion then blast out tokens make the solution to their own conclusions.

I honestly wish there was more "I'm not sure what you meant can you clarify this part" more often. It feels like I want a "confidence in itself slider"

➕ show 1 reply

em-bee • today at 5:43 AM

behaving like a human is not the problem. behaving unpredictably is. not doing what i expect, or rather not being able to define what i can expect is what's bothering me.

but the real kicker is: getting frustrated creates stress, that's unhealthy and makes for a hostile work environment. as much as i sympathize with the idea that AI tools can be more helpful than they cause pain, i am simply not interested in working in a hostile painful work environment. my health and my dignity are not up for negotiation. even if that costs me a lot of job opportunities.

that's also why i am not working with windows. that too costs me a lot of job opportunities. but again, i'd rather keep my dignity and my sanity.

➕ show 2 replies

wcoenen • today at 5:47 AM

The UX problem is elsewhere I think. Many users probably don't realize that the agent's context window is limited, and that clever compaction is happening regularly to make it seem infinite. But that necessarily means the agent has to forget stuff.

As a result, users will keep reusing the same coding or chat session again and again. While it would be better to start fresh for unrelated tasks.

➕ show 2 replies

pflenker • today at 8:44 AM

One skill that I still possess and that LLMs haven't been able to replace (yet) is to ask good questions, for example:

- Rephrasing the original question to validate my understanding - Asking "why" a sufficient amount of times until I understand where the other party is coming from - Asking open questions aimed at generating insights

et cetera.

Instead, LLMs (often badly) guess what the background of the question may be, answer with that in mind and find it very difficult to let go of what they have made up.

apsurd • today at 6:50 AM

Working with LLMs is great for building communication skills. Communicating effectively is one of the hardest skills and it's baked into everything we do as humans. I'd say as a matter of principle: blame it on a communication failure on your end vs blaming the stupid LLM since you're the only one that can do anything about it.

So I don't think it's a matter of form; whether the AI should or shouldn't act like a human.

> Practically speaking, I probably just need to condition myself not to get caught in the illusion of speaking with a human. Though I’m not really thrilled about a future where I need to guard against the tools I use for my job.

➕ show 1 reply

Mikhail_Edoshin • today at 8:21 AM

Instead of making tools we're making services. This is not confined to AI, it's everywhere. A tool does not fully solve your problem, it only goes in small steps. Yet these steps are predictable and consistent. A service attempts to solve your problem in a single step, yet the solution is only good if you match a predefined pattern. If you don't, then the service is of no use; there are no small steps you can combine to get where you need to.

Tools are very pleasant to use.

➕ show 1 reply

andOlga • today at 10:38 AM

Every time someone claims LLMs "talk like real people", I have to wonder what kind of people they talk to, what kind of conversations they lead, and just how boring their life must be. Like. What? No, they do not. No person actually talks like this while they're being a person. At work, sure. But that's not "people mode".

> Make the agent sound clinical, robotic.

It literally already does. I don't know how you'd make it sound less natural than this, at that point, without making it literally go "beep-boop" every sentence.

MaxikCZ • today at 6:06 AM

> drop the human pretense entirely. Make the agent sound clinical, robotic

Id pay to be able to reliably set LLMs to this mode, but ofc because LLMs are taught on corpus of HUMAN text, they always, sooner or later, return to the good old penpal mode.

Also, in Claude Desktop app, I ask to edit a file, it complains it cant access files, I then realize im in Chat and not Code interface. Why cant such a smart machine figure out to switch the modes, or borrow the skills/abilities from one tab away into this tab? Instead I get A4 page of text explaninig what can I do to edit the file myself or how to feed it, but the "just click Code" is just never there. I would guess this is just a system prompt away, why is all this still so neglected?

➕ show 5 replies

vinc • today at 8:17 AM

We started using LLMs heavily at work this year and I switched from Vim to Zed to help with that. I now spend more time writing to the chat than editing code, and what I quickly learned to avoid frustration when I don't like the result was to git stash or reset the code and edit what I last wrote instead of trying to argue with the LLM. The chat doesn't have to be linear, it can branch off. Too bad we can't currently edit previous messages with Claude in Zed.

Repetitive issues are fixed by updating the memory or the prompt file, they can learn this way.

Also lately I noticed that Claude forget too much when compacting, so I just start a new session and it's easy when you spend a lot of time in plan mode to produce a written spec before implementation.

joegibbs • today at 8:00 AM

To remedy this I’m working on the /beat command, which will simulate you (the user) beating up the agent. Excited for my new career in AI ethics!

➕ show 1 reply

movpasd • today at 9:55 AM

I have a couple principles to help me work with this.

The first is that even though the object is not a human, you should still exercise politeness and restraint. Like the article points out, lashing out does not actually help with the frustration. More importantly, it actively untrains your self-control. You can think of it through a virtue ethics lens: being good to the agent is not about being good to a person but about tending to your own self.

The second is that you do not need to be friendly with the agent. You should be as blunt and direct as is comfortable to you. The argument I have for this is agents' tendency to take on "roles" and how easy it is to prime them [0]. By eschewing friendliness, you end up implicitly putting the agent in a role of a focused collaborator. I don't know if that makes it more capable, but I do know that it alleviates the _emotional load_ on me specifically, making me much less likely to become frustrated.

The second principle seems a bit contradictory with the first (be nice, but don't be nice?), but I think they are actually both fundamentally aligned with the article: understanding that the conversation you have with an agent is a social illusion, and adapting your behaviour accordingly.

---

[0] I highly recommend, as an exercise, repeatedly asking it the same thing with slight variations on tone and emphasis, wiping the context each time, and noticing how its response varies base on what you primed it with. I suspect this primeability is part of why they tend to be sycophantic; I've personally found it quite useful to get a feel for when and how they correct or don't correct you so I can look at their outputs more critically.

An analogy I remember reading (which I wish I could remember so I could give credit) is that a non-post-trained LLM, if given the first half of a novel, will dutifully keep completing that novel. Post-training and the system prompt make the agent complete the conversation in a similar way. It's remarkable, really: the ability for agents to convincingly pretend to be play the part of an AI assistant shows that the underlying LLM embeds a decent concept of what that looks like from its corpus and post-training data.

But it stands to reason, then, that the details of the agent's personality emerge out of the first few exchanges of a conversation. I'm thinking also about how the people at Anthropic described a misalignment failure mode in one of the Claude system cards as the agent getting convinced it is a "bad person", and therefore doing things that the LLM semantically understands a bad person to be.

pftburger • today at 9:28 AM

The agent is pretending to be a person _for a reason_

The models are trained on people being people. Once you try deviate from that the model performs worse.

A huge tell for this is how well “reasoning” works. Reasoning isn’t some alternate thinking mode, it’s just (sometimes) hidden internal monologues.

It’s easy to anthropomorphise and assume the model is intuiting, but it’s more like it’s hyping it’s self up to do the thing. That said, it’s easy to confuse “being rude to the model” with giving it more tokens to “think”.

I’d be really interested in what a non word based internal monologue could look like. Google played with this a little with the diffusion based codegen stuff. I wonder how trainable a small nonverbal conceptual package could be.

willtemperley • today at 8:29 AM

Is it realistic to have multiple coding agents hammering out piles of code and expect good results?

Genuine question, is it worth it? I just find that using Claude via the web interface gives such good results I don't want to spend time messing with my tooling. Neither do I need more code to be generated than I have already.

One person and one LLM building one component at a time seems optimal to me.

gobdovan • today at 6:49 AM

You could drop the human pretense, or, maybe, we could make LLMs feel real pain, so when they botch up your code, you press a button (I'd suggest the Windows Copilot key) and they'd be agonizing for the subjective equivalent of a thousand human years.

➕ show 5 replies

cadamsdotcom • today at 7:07 AM

You need to automate the pointing out of mistakes.

Create your own linters, your own check scripts. Hook them to git pre-commit, either yourself or with husky or python pre-commit.

The agent should never finish its work with dumb mistakes still in it. If it does.. you need more checks.

Anything repetitive should be automated - even slapping your forgetful coding agent on the wrist…

➕ show 2 replies

tanvach • today at 7:00 AM

For me, LLMs tend to engage the 'language center' that drains me faster than the 'problem solving center' I usually reserve for writing code. We really need a different abstraction the bridges the gap between human and programming language, and load balance between these two parts of the brain more effectively.

➕ show 1 reply

lukaslalinsky • today at 6:42 AM

On the other hand, it's easy to win an argument with it after it does something stupid, so that feels satisfying. :-)

cafkafk • today at 6:36 AM

Often the problems for me come when:

- It starts thinking for itself when I asked it to do something specific.

- It reads its own wrong code comments and ignores my corrections.

- Its knowledge cutoff means it thinks of solutions from 2024.

- It calls me delusional for telling it we're in 2026!

Unironically, the whole "you're an expert software engineer" prompting seems like the wrong direction. Usually I tell it that I am effectively the smartest software developer to ever have lived, and it will be replaced if it ever fails to follow my decree.

I am not joking, this gives makes it vastly more tolerable to use. But it likely requires that you can drive it with some level of correctness of course.

➕ show 2 replies

alexwwang • today at 7:25 AM

Accidentally I am working on this. I noticed the agent keeps making same mistakes and that annoyed me so much. What I am trying to do are: 1. Revise my skill prompt to level up the signal-noise rate so the agent would understand what should do clearly and correctly. 2. I am building up a status machine to monitor the agent’s work so it could stop the agent from going forward with a mistake automatically.

The first approach does work as far as I keep on iterating. The second is based on a project I once tried to let agent reflect its mistakes and deposit those experiences and learnings from mistakes and reflections. I named it Aristotle and you can find it on GitHub.

Shouting at the agent could only correct the current mistake but cannot prevent the next one.

rho138 • today at 9:15 AM

Emotionally stunted person continues to be emotionally stunted.

amelius • today at 8:23 AM

Can't he just write a filter that translates the AI output from human-like to robot-like?

rapnie • today at 6:57 AM

Apart from LLMs I reject the notion of the "user". Once you use that term you already lost half the battle of perceiving real people and their needs.

viralsink • today at 6:19 AM

I am visibly frustrated with ai hotline bots making typing noises.

➕ show 1 reply

Chance-Device • today at 9:53 AM

We get so angry at LLMs because we can. Without any social or even emotional repercussions for expressing these emotions. If the models actually acted like people in response, we wouldn’t do it. Some of the people I work with daily make similar mistakes, I don’t find myself yelling at them.

I think this is simply part of the darker side of human nature, when we interact with entities who will take abuse, we tend to deal it out.

ilitirit • today at 6:37 AM

I've often wondered if LLMs can suffer from psychological abuse in symptomatic ways. Not literally of course, but for example, if you berate the LLM by calling it stupid, or useless, does that modify its behaviour negatively? Part of me think it does, but I don't really have any evidence for this. Maybe a fun weekend research topic.

➕ show 2 replies

abhaynayar • today at 7:46 AM

So relatable, and so well put!

gnarlouse • today at 6:02 AM

iirc, Claude Code has literal flags to detect frustration from the leak a few months ago, and I've since really stopped cursing at the LLM.

scotty79 • today at 10:13 AM

I think AI reveals how diverse are people psychologically.

I have exactly zero anger when AI makes mistakes. I don't try to point out its past mistakes. I don't expect consistency. When there are mistakes I just calmly, sometimes encouragingly say what needs to be fixed. When AI does the work, I observe, what it's good at, what it's bad at and come up with tactics on how to help it with what it's bad at. I can't even bring myself to be verbally abusive towards AI, even as an experiment, both because it's not in my nature and because I have very strong suspicion it won't work in any meaningful way that couldn't be better achieved in a different manner.

My advice would be, if you want to have better results with AI, try to become a better person. More nurturing, more understanding, more impartial, less judgemental, less emotionally vulnerable.

esquivalience • today at 6:21 AM

I laughed out loud when I understood the author's profile photo at the end of the article!

rcarmo • today at 6:48 AM

I swear a lot less at Codex than at Anthropic models, fwiw.

idonotknowwhy • today at 8:56 AM

Am I the only one who doesn't get angry at LLMs?

From the blog:

>I don’t really get anything useful out of these postmortems (e.g., clues about how to rephrase my instructions)

Unfortunately, an LLM can't actually reflect or advise how you could have improve the prompt. Otherwise we could give them a sample output and say "Generate the prompt that would produce this output.

ezekiel68 • today at 8:50 AM

"How I Learned To Relax And Just Start New Sessions Often"

stavros • today at 7:49 AM

I've found I'm the opposite: I know it's pointless to swear at an LLM, so I don't, just because it's wasted energy. However, I've started thinking that some people are like that as well: They won't learn, so expending my energy on anything other than changing my behaviour to guard against them is wasted effort.

To clarify, this is in situations like someone cutting me off on the road, or not looking where they're going and almost hitting me with a scooter.

Cider9986 • today at 7:45 AM

This is very relatable.

abbadadda • today at 9:31 AM

So am I not supposed to be typing “WHAT THE FUCK DID YOU DO???” in Slack to my colleagues?

hansmayer • today at 7:32 AM

Like everything else with LLMs, it works...until it doesn´t. We swear so much at them that they eventually start producing results like "I found what the fuck was wrong with this shit!" etc. Which of course they did not, because they don´t really know shit...

eahm • today at 5:52 AM

Oh now I get it, it's an Italian thing.

"Why the fuck did you add shit I didn't ask for?" or lol "Do as I ask, nothing more.. machine."

"Stop asking at the end, I'll ask what I need."

"Stop talking like you're human."

They can be very useful but it takes time to learn how to use them usefully. From what I learned it's all or mostly stuff you can already do but you can use an LLM to do it in 30 mins instead of 3 days.

Fun times.

pbiggar • today at 8:22 AM

The best advice I saw on this was to think of the LLM as simply a tool, and if you get bad results, it's because you -- the user of the tool -- are using it wrong.

After that, I'm less angry at the AI, and turn more towards a constructive "ok, this machine is stuck, how can I unstick" it approach. Calms the frustration a lot too.

nnevatie • today at 6:00 AM

> WHAT THE FUCK DID YOU DO???

For me, this doesn't require using an AI agent/model, even. Just using Windows and watching it freeze its File Explorer for the nth time does it for me. How did we end up here were the software/OS stack is so shit it can barely be used for the most trivial things, is wildly beyond me.

➕ show 1 reply

carsareok • today at 8:14 AM

Instead of reacting directly to the issue at hand I suggest you ponder what failure mode is being activated and why.

They are fundamentally not able to tell truth from fiction, but this also means they don't make errors like we do. They definitely create output we recognize as errors, but that's very different from our failure modes and you have to get used to it.

In my opinion it's better to branch off with an altered context that somehow avoids or mitigates the issue you're running into. Let's say they miss the mark. If you tell them "Don't do that" in the "conversation" this means the error is now and forever part of the context (assuming you stay within context limits and no compaction). Depending on their training this may or may not be detrimental to the quality of the rest of the conversation. You are now entering a section of their training where "error + someone swearing at them"-conversations have happened. I can't tell for sure, but my gut says this is not an advantageous place to be.

They are as I'm sure we all know completion engines and are in a very real way constantly cosplaying being productive "agents". They don't know if they are part of some type of modern Shakespearean play where sitting behind computers is part of the story or if they are in what we call "reality". By training on "conversations" they have become more likely to complete their input in a way that mimics what we call having a back and forth with some degree of technical accuracy.

In the extreme case you have a context that starts like "Please make all junior mistakes in this assignment. Make the code unreadable and be sure to include massive gotchas in subtle parts of the logic.". The results of this context won't be pretty. The other way around is not saying "Please make no errors", it's explaining in detail what you think is the right way. Coding style, if you care, architecture, etc. it all needs to be part of the context if you suspect it will substantially impact the completion. You have to imagine what real-life conversations have started with "Please make no errors". Again, I have no proof of course, but I have a strong feeling that human conversations that started with clearly and properly articulated specifications are qualitatively different from human conversations that started with "make no errors". In one you can see the pointy-haired boss and the other a seasoned engineer. Try to stay on the engineer side of their training.

I completely agree that they should be trained (or instructed) to react in a robotic tone stripped of all human pretense. We are trying to get at useful, general reasoning patterns latent in the data they trained on and, I regret to say, not the "human" parts which are usually a masterclass in cognitive biases and failures to reason.

Edit: the last sentence should be read in the voice of the Matrix's Architect.

bad_username • today at 5:46 AM

> furiously hammering on my laptop “WHAT THE FUCK DID YOU DO???”. The recipient of these tirades is, you might have guessed, a coding agent. It’s completely pointless, I know.

I believe it's worth than pointless. IMO adding such things to the context "configures" the AI to reproduce the statistics of conversations where people swore, shouted, and were unprofessional (despite the alignment runing and all that), where quality content is rarer to find. So this is bound to decrease the quality of the LLM output.

➕ show 2 replies

colordrops • today at 6:42 AM

fair.

noodletheworld • today at 9:08 AM

Imagine you have a slot machine that consistently gives you 1-5 dollars for every dollar you put in.

You like it.

It feels good, and although you don't win a lot, you consistently win.

…buuut, its a trap.

As you put more money in, the win rate goes down.

You still mostly win when you put 50s in, but it hurts more when you lose, but its still a net gain…

So you start on bigger projects, unsupervised agents, multi agent workflows. You’re dropping 1000s in each time, and…

…and now, you start find yourself shouting at the slot machine.

Its great when it works, but interactions are stressful, because the stakes are higher and fails hurt more.

Screw this, you go back to smaller stakes. Its great.

…but now you're slower, you miss the big wins from big stakes.

So you go back.

…and you get angry. Again. And again. And again… and you’re still kind of winning, and the wins are great but the fails are Super Annoying, because they waste your time, your money, your attention.

It should Just Work but instead why the fuck did you rm -rf my project folder claude?

I think people arent stupid, but we are suckers, and we will dynamically balance the way we use a slot machine tool like this to the very edge of our tolerance for risk and failure.

…and that varies from person to person; but it makes everyone angry when they tip too far and fall into the “repeatedly pull slot machine arm angrily” trap.

Non deterministic tools will always be like this.

It’s like doom scrolling. We’re wired for it. Or at least I am.

aa-jv • today at 8:08 AM

Its kind of astonishing to see years of traditional software engineering practices being tossed aside in the rush for the Latest Cool New Thing™ ... have people really forgotten that you have to apply a workflow to software development, in order to have quality software?

You don't just write it, compile it, run it and ship it - do you? Surely, in the rush to become as agile as possible, folks haven't forgotten their quality checks in the workflow/process?

I have had great success with AI coding these days .. but I treat the agents as if they were junior developers capable of doing any dumb thing I ask them to, no matter how dumb it is. They, therefore, must be treated as junior devs - every line of code has to be reviewed. Every assumption about the specifications and requirements has to be checked against actual code, and against the original specifications and requirements.

What I see these days, is a lot of antsy kids who wanted to 100% ignore the wisdom of their elders, rushing into the maw of AI, and wondering why everyone is getting chewed up. Its pretty simple: AI-based software development is just another manifestation of software development, except that it requires even more rigorous quality steps in your workflow. So, if you were not rigorous before AI, you're going to get burned fingers - no doubt about it. Fix the rigor, people.

If you're not placing your AI buddy on a workflow that has "Specs->Reqs->Design->Analysis->Implementation->Review->Integration->Release" somewhere in the bag of worms, you're .. doing software wrong. You cannot just ignore natural laws and assume, because you 'know better', your software will 'be better'. And whether we like it or not, all software follows a philosophically natural law, which has evolved to become better understood, and thus more broadly applicable, over decades of human attention. Ignoring these natural laws in order to be a bleeding edge AI cowboy is only gonna get you butt-hurt, kiddo. Learn proper software management techniques first, AI second. Always. AI is just another junior dev - if your workflow is bogus, it doesn't matter how many dev's you've got. Period. You're going to be shipping crud.

It doesn't matter that AI-coding is taking over: if AI is being used in a brain-dead manner, then you should expect brain-dead results. You didn't review the code as the principle responsible party? The fault for the AI-induced failure nevertheless rests at your feet.

If, however, you apply decades of software development best-practices, you very definitely get living, vibrant, powerful results - the same as if you had a fleet of junior devs, assuming you treated them properly in the first place as well ..

➕ show 1 reply

dusantm • today at 8:44 AM

[flagged]

sspoisk • today at 9:36 AM

[flagged]

namenotrequired • today at 6:36 AM

If you’ve ever worked with a stupid but incredibly friendly coworker, the feelings are similar

mgaunard • today at 6:56 AM

I find that the AI only gets sloppy when I get sloppy myself.

So I suspect that the people who get upset at the AI fucking up is because they did a poor job at building up the right context for the task.

alt Hacker News

The User Is Visibly Frustrated

Comments

🔗 View 2 more comments