logoalt Hacker News

Uhhrrryesterday at 7:17 PM6 repliesview on HN

The two errors, then, were that the LLM hallucinated something, and that a human trusted the LLM without reasoning about its answer. The fix for this common pattern is to reason about LLM outputs before making use of them.


Replies

paxysyesterday at 7:22 PM

A big problem now both internally to a company and externally is that official support channels are being replaced by chatbots, and you really have no option but to trust their output because a human expert is no longer available.

If I post a question to the internal payment team's forum about a critical processing issue and some "payments bot" replies to me, should I be at fault for trusting the answer?

show 2 replies
SlinkyOnStairsyesterday at 7:26 PM

> The fix for this common pattern is to reason about LLM outputs before making use of them.

That is politics. Not engineering.

Assigning a human to "check the output every time" and blaming them for the faults in the output is just assigning a scapegoat.

If you have to check the AI output every single time, the AI is pointless. You can just check immediately.

show 3 replies
somewhereoutthyesterday at 7:25 PM

However - Automation bias is a common problem (predating AI), the 'human-in-the-loop' ends up implicitly trusting the automated system.

show 1 reply
leptonsyesterday at 7:25 PM

If "the level of awareness that created a problem, cannot be used to fix the problem", then you're asking too much if you expect a human to reason about an LLM output when they are the ones that asked an LLM to do the thinking for them to begin with.

show 2 replies
krupanyesterday at 7:21 PM

It's more like, the LLM "hallucinated" (I hate that term) and automatically posted the information to the forum. It sounds like the human didn't get a chance to reason about it. At least not the original human that asked the LLM for an answer

show 2 replies
alfalfasproutyesterday at 7:31 PM

When organizational incentives penalize NOT using AI and firing the bottom x% regularly then are you really surprised LLM outputs aren't being scrutinized?

show 1 reply