logoalt Hacker News

fragmedeyesterday at 2:52 AM1 replyview on HN

What custom prompt do you have set up? If you tell it you're occupation, does it turn helpful? There was a study that if you tell models they tested that you're a patient, it would refuse, but tell it you're a doctor and suddenly it turns helpful.


Replies

garciasnyesterday at 3:43 AM

According to the model, it’s not the model itself that’s doing this, it’s the harness.

Assuming the model is being “truthful”, CC is just being stupid in its detection mechanism.