logoalt Hacker News

dylan604last Monday at 12:39 AM3 repliesview on HN

and exactly how will the llm be punished? will it be unplugged? these kinds of things make me roll my eyes. as if the bot has emotions to feel that avoiding punishment will be something to avoid. might as well just say or else.


Replies

Legend2440last Monday at 2:27 AM

Threats or “I will tip $100” don’t really work better than regular instructions. It’s just a rumor left over from the early days when nobody knew how to write good prompts.

wat10000last Monday at 3:04 AM

Think about how LLMs work. They’re trained to imitate the training data.

What’s in the training data involving threats of punishment? A lot of those threats are followed by compliance. The LLM will imitate that by following your threat with compliance.

Similarly you can offer payment to some effect. You won’t pay, and the LLM has no use for the money even if you did, but that doesn’t matter. The training data has people offering payment and other people doing as instructed afterwards.

Oddly enough, offering threats or rewards is the opposite of anthropomorphizing the LLM. If it was really human (or equivalent), it would know that your threats or rewards are completely toothless, and ignore them, or take them as a sign that you’re an untrustworthy liar.

show 1 reply
immibislast Monday at 12:56 AM

It's not about delivering punishment. It's about suppressing certain responses. If the model is trained seeing that responses using don't contain things that previous messages say will be punished then that is a valid way to deprioritize those responses.