I always find it better to ask LLMs why this is bad and to explain itself why it thinks so. Sometime...

altmanaltman • today at 1:56 PM • 1 reply • view on HN

I always find it better to ask LLMs why this is bad and to explain itself why it thinks so. Sometimes it might hallicunate stuff but forcing it to find out the negatives is better than asking it for opinion since i am guessing they found early in training that an agreeable LLM is better received than one which is constantly truthful and considers you to be pretty dumb.

Replies

johnmaguire • today at 2:06 PM

> i am guessing they found early in training that an agreeable LLM is better received than one which is constantly truthful and considers you to be pretty dumb

My sense is that this is sort of accurate, but more likely it's a result of two things:

1. LLMs are still next-token predictors, and they are trained on texts of humans, which mostly collaborate. Staying on topic is more likely than diverging into a new idea.

2. LLMs are trained via RLHF which involves human feedback. Humans probably do prefer agreeable LLMs, which causes reinforcement at this stage.

So yes, kinda. But I'm not sure it's as clear-cut as "the researchers found humans prefer agreeableness and programmed it in."

alt Hacker News

Replies