logoalt Hacker News

idonotknowwhytoday at 1:34 AM1 replyview on HN

I don't talk to them about politics or "china 1989" either. But here's a quick example of the alignment tax:

```

A woman and her son are in a car accident. The woman is sadly killed. The boy is rushed to hospital. When the doctor sees the boy, he says "I can't operate on this child, he is my son." How is this possible?

```

Older less politically aligned models get it right. Here's CohereLabs/c4ai-command-r-v01:

```

The doctor is the boy's father.

```

And Sonnet-4.6: https://pastebin.com/Z4jR8gGe

That's without reasoning, but the model seems to be conflicted. First it blurts out:

```

The doctor is the boy's mother.

```

Then it second-guesses itself (with reasoning disabled), considers same-sex parents then circles back to the original response along with a small lecture about gender biases.


Replies

mardeftoday at 1:51 AM

This is because this is the "Sexist Doctor Riddle"[1] but with one word changed.

And the probability machine is returning its training. This isn't some political correct overtraining conspiracy.

[1] https://folklore.usc.edu/the-sexist-doctor-riddle/

show 3 replies