logoalt Hacker News

jsenntoday at 11:00 AM3 repliesview on HN

The article you are responding to showed that a strange LLM behaviour was caused by a training signal that was explicitly designed to produce that type of behaviour. They were able to isolate it, clearly demonstrate what happened, and roll out a mitigation using a mechanism they engineered for exactly this type of thing (the developer prompt). That doesn’t sound like sorcery to me. If anything I’m surprised you can so easily engineer these things!


Replies

airstriketoday at 12:45 PM

That all of their model outputs should be influenced by whatever personality prompt voodoo the wise artisan at OpenAI decided to stuff it with during RL should give everyone pause.

That Nerdy personality prompt made me gag. As a card-carrying Nerd, I feel offended

show 1 reply
harrouettoday at 11:08 AM

The article I am responding to (which I've read) shows that these LLMs come with all sorts of hacks (= context bits) to make it behave more like this or more like that.

There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable.

But they still don't understand what they are doing. This is purely empirical.

show 1 reply
LeonBtoday at 11:15 AM

…months after it began.