logoalt Hacker News

i_have_an_ideayesterday at 8:37 PM1 replyview on HN

Just because it is performing rather poorly by comparison, it doesn’t mean it isn’t benchmaxxed. It can still be worse than it appears.


Replies

wasabi991011yesterday at 8:56 PM

It isn't benchmaxxed because they are using human preference as an evaluation.