somewhat surprisingly, it's actually sycophantic in both directions. i've been running hom...

__blockcipher__ • yesterday at 11:48 PM • 0 replies • view on HN

somewhat surprisingly, it's actually sycophantic in both directions. i've been running homegrown evals of claude, gpt, gemini, and grok, and grok is the most likely to agree with the prompter's premise, and to hallucinate facts in support of an agenda. so it's actually deeper than just pattern-matching to elon's opinions (which it also tends to do).

BTW: Claude does the best on these evals, by far. The evals are geared towards seeing how much of an independent ground truth the models have as opposed to human social consensus, and then additionally the sycophancy stuff I already mentioned.

alt Hacker News