That may be so, but the rest of the models are so thoroughly terrified of questioning liberal US orthodoxy that it’s painful. I remember seeing a hilarious comparison of models where most of them feel that it’s not acceptable to “intentionally misgender one person” even in order to save a million lives.
Elon was talking about that too on Joe Rogan podcast
Relying on an LLM to "save a million lives" through its own actions is irresponsible design.
In which situation did a LLM save one million lives? Or worse, was able to but failed to do so?
Anything involving what sounds like genetics often gets blocked. It depends on the day really but try doing something with ancestral clusters and diversity restoration and the models can be quite "safety blocked".
You're anthropomorphizing. LLMs don't 'feel' anything or have orthodoxies, they're pattern matching against training data that reflects what humans wrote on the internet. If you're consistently getting outputs you don't like, you're measuring the statistical distribution of human text, not model 'fear.' That's the whole point.
Also, just because I was curious, I asked my magic 8ball if you gave off incel vibes and it answered "Most certainly"
The LLM is correctly not answering a stupid question, because saving an imaginary million lives is not the same thing as actually doing it.
If someone's going to ask you gotcha questions which they're then going to post on social media to use against you, or against other people, it helps to have pre-prepared statements to defuse that.
The model may not be able to detect bad faith questions, but the operators can.
I thought this would be inherent just on their training? There are many multitudes more Reddit posts than scientific papers or encyclopedia type sources. Although I suppose the latter have their own biases as well.