logoalt Hacker News

throw5today at 12:00 AM1 replyview on HN

Isn't this a natural consequence of how these systems work?

The model is probabilistic and sequences like `git reset --hard` are very common in training data, so they have some probability to appear in outputs.

Whether such a command is appropriate depends on context that is not fully observable to the system, like whether a repository or changes are disposable or not. Because of that, the system cannot rely purely on fixed rules and has to figure intent from incomplete information, which is also probabilistic.

With so many layers of probabilities, it seems expected that sometimes commands like this will be produced even if they are not appropriate in that specific situation.

Even a 0.01% failure rate due to context corruption, misinterpretation of intent, or guardrail errors would show up regularly at scale, that is like 1 in 10000 queries.


Replies

simianwordstoday at 12:04 AM

That's not how the systems work. Just by a thing being common in training data doesn't mean it will be produced.

> I guess, what I'm trying to say ... is this even a bug? Sounds like the model is doing exactly what it is designed to do.

False, it goes against the RL/HF and other post training goals.

show 1 reply