logoalt Hacker News

keiferskitoday at 1:46 PM36 repliesview on HN

The thing that bothers me the most about LLMs is how they never seem to understand "the flow" of an actual conversation between humans. When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.

LLMs don't do this. Instead, every question is immediately responded to with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.


Replies

ryandraketoday at 3:48 PM

LLMs all behave as if they are semi-competent (yet eager, ambitious, and career-minded) interns or administrative assistants, working for a powerful CEO-founder. All sycophancy, confidence and positive energy. "You're absolutely right!" "Here's the answer you are looking for!" "Let me do that for you immediately!" "Here is everything I know about what you just mentioned." Never admitting a mistake unless you directly point it out, and then all sorry-this and apologize-that and "here's the actual answer!" It's exactly the kind of personality you always see bubbling up into the orbit of a rich and powerful tech CEO.

No surprise that these products are all dreamt up by powerful tech CEOs who are used to all of their human interactions being with servile people-pleasers. I bet each and every one of them are subtly or overtly shaped by feedback from executives about how they should respond to conversation.

show 9 replies
jodrellblanktoday at 2:45 PM

> LMs don't do this. Instead, every question is immediately responded with extreme confidence with a paragraph or more of text.

Having just read a load of Quora answers like this, which did not cover the thing I was looking for, that is how humans on the internet behave and how people have to write books, blog posts, articles, documentation. Without the "dance" to choose a path through a topic on the fly, the author has to take the burden of providing all relevant context, choosing a path, explaining why, and guessing at any objections and questions and including those as well.

It's why "this could have been an email" is a bad shout. The summary could have been an email, but the bit which decided on that being the summary would be pages of guessing all the things which what might have been in the call and which ones to include or exclude.

show 2 replies
rafamcttoday at 1:53 PM

Yes you're totally right! I misunderstood what you meant, let me write six more paragraphs based on a similar misunderstanding rather than just trying to get clarification from you

show 2 replies
herftoday at 3:07 PM

Training data is quite literally weighted this way - long responses on Reddit have lots of tokens, and brief responses don't get counted nearly as much.

The same goes for "rules" - you train an LLM with trillions of tokens and try to regulate its behavior with thousands. If you think of a person in high school, grading and feedback is a much higher percentage of the training.

show 1 reply
zenopraxtoday at 3:30 PM

ChatGPT offered a "robotic" personality which really improved my experience. My frustrations were basically decimated right away and I quickly switched to a more "You get out of it what you put in" mindset.

And less than two weeks in they removed it and replaced it with some sort of "plain and clear" personality which is human-like. And my frustrations ramped up again.

That brief experiment taught me two things: 1. I need to ensure that any robots/LLMs/mech-turks in my life act at least as cold and rational as Data from Star Trek. 2. I should be running my own LLM locally to not be at the whims of $MEGACORP.

heresie-dabordtoday at 4:04 PM

> The thing that bothers me the most about LLMs is

What bothers me the most is the seemingly unshakable tendency of many people to anthropomorphise this class of software tool as though it is in any way capable of being human.

What is it going to take? Actual, significant loss of life in a medical (or worse, military) context?

show 1 reply
morksinaanabtoday at 8:28 PM

I suspect that's because, trained on website content, seo values more text (see recipe websites). So the default response is fluff.

vidarhtoday at 6:24 PM

A lot of this, I suspect, on the basis of having worked on a supervised fine-tuning project for one of the largest companies in this space, is that providers have invested a lot of money in fine-tuning datasets that sound this way.

On the project I did work on, reviewers were not allowed to e.g. answer that they didn't know - they had to provide an answer to every prompt provided. And so when auditing responses, a lot of difficult questions had "confidently wrong" answers because the reviewer tried and failed, or all kinds of evasive workarounds because they knew they couldn't answer.

Presumbly these providers will eventually understand (hopefully already has - this was a year ago) that they also need to train the models to understand when the correct answer is "I don't know", or "I'm not sure. I think maybe X, but ..."

show 1 reply
Archelaostoday at 2:21 PM

I never expected LLMs to be like an actual conversation between humans. The model is in some respects more capable and in some respects more limited than a human. I mean, one could strive for an exact replica of a human -- but for what purpose? The whole thing is a huge association machine. It is a surealistic inspiration generator for me. This is how it works at the moment, until the next break through ...

show 3 replies
chemotaxistoday at 7:34 PM

This is not necessarily a fundamental limitation. It's a consequence of a fine-tuning process where human raters decide how "good" an answer is. They're not rating the flow of the conversation, but looking at how complete / comprehensive the answer to a one-shot question looks like. This selects for walls of overconfident text.

Another thing the vendors are selecting for is safety / PR risk. If an LLM answers to a hobby chemistry question in a matter-of-factly way, that's a disastrous PR headline in the making. If they open with several paragraphs of disclaimers or just refuse to answer, that's a win.

Workaccount2today at 3:30 PM

They are purposely trained to be this way.

In a way it's benchmaxxing because people like subservient beings that help them and praise them. People want a friend, but they don't want any of that annoying friction that comes with having to deal with another person.

wincytoday at 4:26 PM

Cursor Plan mode works like this. It restricts the LLMs access to your environment and will allow you to iteratively ask and clarify and it’ll piece together a plan that it allows you to review before it takes any action.

ChatGPT deep research does this but it’s weird and forced because it asks one series of questions and then goes off to the races, spending a half hour or more building a report. It’s frustrating if you don’t know what to expect and my wife got really mad the first time she wasted a deep research request asking it “can you answer multiple series of questions?” Or some other functionality clarifying question.

I’ve found Crusor’s plan mode extremely useful, similar to having a conversation with a junior or offshore team member who is eager to get to work but not TOO eager. These tools are extremely useful we just need to get the guard rails and user experience correct.

jacquesmtoday at 3:15 PM

If you're paying per token then there is a big business incentive for the counterparty to burn tokens as much as possible.

show 3 replies
max51today at 7:14 PM

>LLMs don't do this

They did at the beginning. It used to be that if you wanted a full answer with an intro, bullet points, lists of pros/cons, etc., you had to explicitly ask for it in the prompt. The answers were also a lot more influenced by the tone of the prompt instead of being forced into answering with a specific format like it does right now.

quietbritishjimtoday at 7:14 PM

That just means that you need to learn to adapt to the situation: Make your prompt a carefully crafted multi-paragraph description of every detail of the problem and what you want from the solution, with bullet points if appropriate.

Maybe it feels a bit sad that you have follow what the LLM wants, but that's just how any tool works really.

LogicFailsMetoday at 4:40 PM

My favorite description of an LLM so far is of a typical 37-year-old male Reddit user. And in that sense, we have already created the AGI.

nowittyusernametoday at 5:28 PM

Its not a magic technology, they can only represent data they were trained on. Naturally most represented data in their training data is NOT conversational. Consider that such data is very limited and who knows how it was labeled if at all during pretraining. But with that in mind, LLM's definitely can do all the things you describe, but a very robust and well tested system prompt has to be used to coax this behavior out. Also a proper model has to be used, as some models are simply not trained for this type of interaction.

luijktoday at 6:39 PM

By default they don't ask questions. You can craft that behaviour with the system message or account settings. Though they will tend to ask 20 questions at once so you have to request it to limit to one question at a time to get a more natural experience.

zbytoday at 6:34 PM

When I expect it to do that I just end my prompt with '. Discuss' - usually this works really well. Not exactly human like - it tries to list all questions and variants at once - but most with good default answers so I only need to engage with a couple of them.

rossanttoday at 2:26 PM

Lately, ChatGPT 5.1 has been less guilty of this and sometimes holds off answering fully and just asks me to clarify what I meant.

__turbobrew__today at 6:12 PM

The day when the LLM responds to my question with another question will be quite interesting. Especially at work, when someone asks me a question I need to ask for clarifying information to answer the original question fully.

HPsquaredtoday at 2:31 PM

There are plenty of LLM services that have a conversational style. The paragraph blocks thing is just a style.

jstummbilligtoday at 3:36 PM

a) I find myself fairly regularly irritated by the flow of human-human conversations. In fact, it's more common than not. Of course, I have years of practice handling that more or less automatically, so it rarely raises to the level of annoyance, but it's definitely work I bring to most conversations. I don't know about you but that's not really a courtesy I extend to the LLM.

b) If it is, in fact, just one setting away, then I would say it's operating fairly similarly?

not_aitoday at 2:14 PM

I didn't have the words to articulate some of my frustrations, but I think you summed it up nicely.

For example, there's been many times when they take it too literally instead of looking at the totality of the context and what was written. I'm not an LLM, so I don't have perfect grasp on every vocab term for every domain and it feels especially pandering when they repeat back the wrong word but put it in quotes or bold instead of simply asking if I meant something else.

heavyset_gotoday at 5:31 PM

I don't want to talk to a computer like I would a human

cortesofttoday at 5:05 PM

Have you used Claude much? It often responds to things with questions

motoboitoday at 2:42 PM

Reflect a moment over the fact that LLMs currently are just text generators.

Also that the conversational behavior we see it’s just examples of conversations that we have the model to mimic so when we say “System: you are a helpful assistant. User: let’s talk. Assistant:” it will complete the text in a way that mimics a conversation?.

Yeah, we improved over that using reinforcement learning to steer the text generation into paths that lead to problem solving and more “agentic” traces (“I need to open this file the user talked about to read it and then I should run bash grep over it to find the function the user cited”), but that’s just a clever way we found to let the model itself discover which text generation paths we like the most (or are more useful to us).

So to comment on your discomfort, we (humans) trained the model to spill out answers (there are thousand of human being right now writing nicely though and formatted answers to common questions so that we can train the models on that).

If we try to train the models to mimic long dances into shared meaning we will probably decrease their utility. And we won’t be able anyway to do that because then we would have to have customized text traces for each individual instead of question-answers pairs.

Downvoters: I simplified things a lot here, in name of understanding, so bear with me.

show 1 reply
TimPCtoday at 1:56 PM

The benchmarks are dumb but highly followed so everyone optimizes for the wrong thing.

solumunustoday at 6:22 PM

You just need to be more explicit. Including “ask clarifying questions” in your prompt makes a huge difference. Not sure if you use Claude Code but if you do, use plan mode for almost every task.

DoneWithAllThattoday at 4:45 PM

When using an LLM for anything serious (such as at work) I have a standard canned postscript along the lines of “if anything about what I am asking is unclear or ambiguous, or if you need more context to understand what I’m asking, you will ask for clarification rather than try to provide an answer”. This is usually highly effective.

catigulatoday at 4:31 PM

Claude doesn't really have this problem.

dominotwtoday at 4:38 PM

same experience. i try to learn with it but i can't really tell if what its teaching me is actually correct or merely making up when i challenge it with followup questions.

gowldtoday at 5:46 PM

There are billions of humans. Not every one speaks the same way all the time. The default behavior is trying to be useful for most people.

It's easy to skip and skim content you don't care about. It's hard to prod and prod to get to to say something you do care about it if the machine is traint to be very concise.

Complaining the AI can't read your mind is exceptionally high praise for the AI, frankly.

morkalorktoday at 2:06 PM

This drives me nuts when trying to bounce an architecture or coding solution idea off an LLM. A human would answer with something like "what if you split up the responsibility and had X service or Y whatever". No matter how many times you tell the LLM not to return code, it returns code. Like it can't think or reason about something without writing it out first.

show 3 replies
Traubenfuchstoday at 2:55 PM

> When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.

You obviously never wasted countless hours trying to talk to other people on online dating apps.

bwahah4today at 6:09 PM

In the US anyway, most adults read at a middle school level.

It's not "masquerading as a human". The majority of humans are functional illiterates who only understand the world through the elementary principles of their local culture.

It's the minority of the human species that take what amounts to little more than arguing semantics that need the reality check. Unless one is involved in work that directly impacts public safety (defined as harm to biology) the demand to apply one concept or another is arbitrary preference.

Healthcare, infrastructure, and essential biological support services are all most humans care about. Everything else the majority see as academic wank.