logoalt Hacker News

r_leetoday at 12:54 PM3 repliesview on HN

I don't think it's a training issue, it's simply that there's no inherent "I don't know" in the transformer architecture unless it's really like something completely unknown, otherwise the nearest neighbor will be chosen and that will be whatever sounds similar or is relevant, even if it might cause a problem


Replies

feorentoday at 2:46 PM

The final output of the neural network part of an LLM is a vector with weights for every token, that is then usually softmaxed and picked from. Can we not quantify the uncertainty by looking at the distribution of weights of the top 10 options? Like we expect for a note-taking app that the top choice would be something like 98% certain, and if we see that the model gives a weight of 60% to "Russia" and 30% to "France", that's just not enough certainty to simply output "Russia". That's exactly when it should say "<uncertain>" or something instead.

show 1 reply
aspenmartintoday at 1:10 PM

Not inherent in transformer architecture, we do try to ingrain a sense of uncertainty but it’s difficult not only technically but also philosophically/culturally. How confident do you want the model to be in its answer to “why did Rome fall”?

Lots of tools in our toolbelts to do better uncertainty calibration but it trades off against other capabilities and actually can be rather frustrating to interact with in agentic contexts since it will constantly need input from you or otherwise be indecisive and overly cautious. It’s not technically a limitation of transformer architecture but it is more challenging to deal with than other architectures/statistical paradigms.

Like you can maintain a belief state and generate conditional on this and train to ensure belief state is stable and performant. But evals reward guessing at this point, and it’s very very hard to evaluate the calibration in these open ended contexts. But we’re slowly getting there, just not nearly as fast as other capabilities.

show 1 reply
user_7832today at 1:56 PM

The thing is, if LLMs are stochastic parrots predicting the next word (aka, a partially decent auto complete), there's no reason it can't complete <specific question it can't answer> as "I don't know" - as that's a perfectly valid sentence too.

That's why I'm still cautiously optimistic about LLMs somewhere being good enough. I don't know if or when someone will manage to do it, but I'm hopeful.

show 2 replies