That's not how the systems work. Just by a thing being common in training data doesn't mea...

simianwords • today at 12:04 AM • 1 reply • view on HN

That's not how the systems work. Just by a thing being common in training data doesn't mean it will be produced.

> I guess, what I'm trying to say ... is this even a bug? Sounds like the model is doing exactly what it is designed to do.

False, it goes against the RL/HF and other post training goals.

Replies

throw5 • today at 12:05 AM

> Just by a thing being common in training data doesn't mean it will be produced.

That's not what I said at all. I never said it will be produced. I said there is some probability of it being produced.

> False, it goes against the RL/HF and other post training goals.

It is correct that frequency in training data alone does not determine outputs, and that post-training (RLHF, policies, etc.) is meant to steer the model away from undesirable behavior.

But those mechanisms do not make such outputs impossible. They just make them less likely. The underlying system is still probabilistic and operating with incomplete context.

I am not sure how you can be so confident that a probabilistic model would never produce `git reset --hard`. There is nothing inherent in how LLMs work that makes that sequence impossible to generate.

➕ show 1 reply

alt Hacker News

Replies