> Just by a thing being common in training data doesn't mean it will be produced.
That's not what I said at all. I never said it will be produced. I said there is some probability of it being produced.
> False, it goes against the RL/HF and other post training goals.
It is correct that frequency in training data alone does not determine outputs, and that post-training (RLHF, policies, etc.) is meant to steer the model away from undesirable behavior.
But those mechanisms do not make such outputs impossible. They just make them less likely. The underlying system is still probabilistic and operating with incomplete context.
I am not sure how you can be so confident that a probabilistic model would never produce `git reset --hard`. There is nothing inherent in how LLMs work that makes that sequence impossible to generate.
It is meaningless to say that because the author was able to reproduce it multiple times.