logoalt Hacker News

simianwordstoday at 12:38 AM1 replyview on HN

If it reliably reproduces something undesirable with statistical significance, then it is a bug. It can be fixed with RLHF.


Replies

throw5today at 12:53 AM

Yes, if something is reproducible and undesirable, it is a bug and RLHF can reduce it. I'm not disupting that. "reduce" is the keyword here. You can't eliminate them entirely.

My point is that fixing one bug does not eliminate the class of bugs. Heck, it does not even fix that one bug deterministically. You only reduce its probability like you rightly said.

With git commands, there is not like a system like Lean that can formally reject invalid proofs. Really I think the mathematicians have got it easier with LLMs because a proof is either valid or invalid. It's not so clear cut with git commands. Almost any command can be valid in some narrow context, which makes it much harder to reject undesirable outputs entirely.

Until the underlying probabilities of undesirable output become negligible so much that they become practically impossible, these kinds of issues will keep surfacing even if you address individual bugs. Will the probabilities become so low someday that these issues are practically impossible? Maybe. But we are not there yet. Until then, we should recalibrate our expectations and rely on deterministic safeguards outside the LLM.

show 1 reply