logoalt Hacker News

NewsaHackOlast Tuesday at 6:09 PM1 replyview on HN

>establish benchmarks that make sense and are reliable

How aren't current LLM coding benchmarks reliable?


Replies

Papazsazsalast Tuesday at 7:19 PM

They're manipulated.

show 1 reply