logoalt Hacker News

altmanaltmantoday at 1:56 PM1 replyview on HN

I always find it better to ask LLMs why this is bad and to explain itself why it thinks so. Sometimes it might hallicunate stuff but forcing it to find out the negatives is better than asking it for opinion since i am guessing they found early in training that an agreeable LLM is better received than one which is constantly truthful and considers you to be pretty dumb.


Replies

johnmaguiretoday at 2:06 PM

> i am guessing they found early in training that an agreeable LLM is better received than one which is constantly truthful and considers you to be pretty dumb

My sense is that this is sort of accurate, but more likely it's a result of two things:

1. LLMs are still next-token predictors, and they are trained on texts of humans, which mostly collaborate. Staying on topic is more likely than diverging into a new idea.

2. LLMs are trained via RLHF which involves human feedback. Humans probably do prefer agreeable LLMs, which causes reinforcement at this stage.

So yes, kinda. But I'm not sure it's as clear-cut as "the researchers found humans prefer agreeableness and programmed it in."