They clearly RLHF out the embarrassing cases and make cheating on benchmarks into a sport.

AyyEye • last Monday at 12:35 AM • 1 reply • view on HN

I wouldn't be surprised if some models get set up to identify that type of question and run the word through string processing function.

➕ show 1 reply

alt Hacker News