logoalt Hacker News

AyyEyelast Monday at 12:35 AM1 replyview on HN

They clearly RLHF out the embarrassing cases and make cheating on benchmarks into a sport.


Replies

Terr_last Tuesday at 11:33 PM

I wouldn't be surprised if some models get set up to identify that type of question and run the word through string processing function.

show 1 reply