> If they miss a word they never do unintelligible, they just start playing madlibs based on the ...

user_7832 • today at 11:59 AM • 3 replies • view on HN

> If they miss a word they never do unintelligible, they just start playing madlibs based on the rest of the sentence.

Imo this is the single biggest flaw of LLMs. They're great at a lot of things, but knowing when they're wrong (or don't have enough information to actually work on) is a critical flaw.

IMO there's nothing structural about why they shouldn't be able to spot this and correct themselves - I suspect it's a training issue. But presumably bots that infer context/fill in the dots rank better on what people like... at the cost of accuracy.

Replies

netdevphoenix • today at 2:00 PM

It's just a token predictor what do you expect? What we need are tools that embrace that and ping the agent to validate what it just said or double check. But the trade off is that this might hamper their capabilities to some level

➕ show 4 replies

r_lee • today at 12:54 PM

I don't think it's a training issue, it's simply that there's no inherent "I don't know" in the transformer architecture unless it's really like something completely unknown, otherwise the nearest neighbor will be chosen and that will be whatever sounds similar or is relevant, even if it might cause a problem

➕ show 3 replies

moffkalast • today at 1:19 PM

It's a benchmark and eval issue. Guessing gets them the right result sometimes and the models rank better in error rate than they'd otherwise. We need the kind of benchmarks that penalize being wrong WAY more than saying "I don't know".

Of course there's a secondary problem that the model may then overuse the unintelligible option, but that's something that's a matter of training them properly against that eval.

You could also try thresholding the output based on perplexity to remove the parts that the model is less sure about, but that's not going to be super accurate I think.

➕ show 1 reply

alt Hacker News

Replies