Two things: - Good logits and sampling strategy can make cases like those exceptionally unlikely -...

LoganDark • today at 2:32 PM • 1 reply • view on HN

Two things:

- Good logits and sampling strategy can make cases like those exceptionally unlikely -- sufficiently so for one to assume it won't realistically happen.

- Once a bad path is sufficiently taken, it tends to be a lot more likely to continue.

This leads to real-world advice:

- If a model refuses your request, do not argue with the refusal; edit your original message or otherwise regenerate -- the presence of refusal tells the model it should continue refusing.

- More generally, don't allow the model context to get contaminated with behavior or commands you don't like -- describing what to do is more effective than describing what NOT to do.

- It's rude to push unreviewed model outputs onto others.

I'm not saying corner-cutting is a thing that is necessarily all over the place (even though there are countless examples in the wild). I'm also not saying it always results in random stops, or that doing one thing bad makes everything else bad. What I'm saying is that bad decisions could be hidden anywhere, at any time, even if everything else looks fine. Such is the nature of current LLMs.

Replies

ben_w • today at 3:46 PM

> - Once a bad path is sufficiently taken, it tends to be a lot more likely to continue.

no they demonstrably self correct with injected bad tokens

> - If a model refuses your request, do not argue with the refusal; edit your original message or otherwise regenerate -- the presence of refusal tells the model it should continue refusing.

wish i could do that with humans :P

> - It's rude to push unreviewed model outputs onto others.

Yes but that's not why.

Same real-world advice applies to stuff random fresh graduates make. I remember being one of those.

The incompetence is why.

alt Hacker News

Replies