logoalt Hacker News

Lies, Damn Lies and Database Benchmarks

16 pointsby eigenBasislast Tuesday at 2:27 AM3 commentsview on HN

Comments

bitladtoday at 9:54 AM

Reminds me of the recent Terminal Bench controversy [1][2][3]

If theres a benchmark, people will cheat, lie and optimize for that benchmark. Honest depends on the compliance enforced on teams. But if, compliance itself is weak, it is going to be taken advantage of. Like growing up india, you would optimize for the exam and not what you learn from it.

[1] https://news.ycombinator.com/item?id=47920787

[2] https://www.tbench.ai/news/leaderboard-integrity-update

[3] https://debugml.github.io/cheating-agents/

N_Lenstoday at 11:31 AM

Same with LLM benchmarks these days.

show 1 reply