logoalt Hacker News

bkotoday at 12:23 AM4 repliesview on HN

> Hallucinations and poor guidance are still a regular day-to-day issue that makes it impossible for me to trust agents with anything.

I often hear this. Can you give me a question where a major LLM hallucinates or provides poor guidance? Reproducible would be great

Just a question to stump it.


Replies

atomicnumber3today at 12:27 AM

Just today, the LLM based auto-review that my company enabled for all PRs edited my PR description to confidently assert that I had added a new RPC. I had not. I deleted code and nothing else. Nothing was added. The RPC it claimed I added did not exist.

This is a common occurrence.

al_borlandtoday at 1:03 AM

LLMs are nondeterministic, so it’s impossible to make something 100% reproducible. Even if it has an issue, it might do it in a different way. If it’s well publicized, they’ll patch that very specific example, but the foundational issue is still there (like counting the R’s in strawberry).

I still regularly run into the issue where it just makes up API endpoints, CLI commands, or add flags that simply don’t exist.

I also regularly ask it things and it gives me a bad answers, so I push back, and it says something to the effect of “you’re right, I didn’t consider that, let me look at that more”… then tells me the exact opposite of the previous response.

Or it “thing X has never happened”, and I ask what about <insert example>, and it goes to look it up and says, “oh, thing X actually did happen.”

I run into this daily. Multiple times per day. How can I trust a system like this? Are people just blindly accepting what the LLM says as truth? Is that why people think it’s good?

torawaytoday at 4:00 AM

Bad example but since it literally just happened a few hours ago:

Teams Copilot meeting assistant auto-renamed a meeting title/summary that’s now prominently placed at the top to “Month end close wrap up discussion“ because someone posted in chat “sorry can’t make the meeting, we’re wrapping up month end close”.

Really confused the next guy who joined the meeting and derailed things for a minute or two before we could get back on topic.

jagged-chiseltoday at 12:26 AM

> Reproducible would be great

Wouldn’t it be great? I’m still waiting for reproducibility from LLMs.

show 1 reply