>hopefully changes the way benchmarking is done. Yeah the path forward is simple: check if the ...

operatingthetan • yesterday at 7:55 PM • 4 replies • view on HN

>hopefully changes the way benchmarking is done.

Yeah the path forward is simple: check if the solutions actually contain solutions. If they contain exploits then that entire result is discarded.

Replies

siva7 • yesterday at 8:07 PM

Could it really be that not only we vibeslop all apps nowadays but also don't care to even check how ai solved a benchmark it claimed solved?

➕ show 4 replies

ZeroGravitas • yesterday at 8:24 PM

In human multiple choice tests they sometimes use negative marking to discourage guessing. It feels like exploits should cancel out several correct solutions.

➕ show 1 reply

Leynos • yesterday at 8:02 PM

Also, fuzz your benchmarks

Aperocky • yesterday at 11:49 PM

solution is simple:

if bug { dont }

alt Hacker News

Replies