logoalt Hacker News

LiamPowelltoday at 3:47 AM2 repliesview on HN

> You are a senior SWE-Bench reviewer, make no mistakes.

I don't know what a better approach would look like while still remaining feasible, however this approach of telling a LLM to make a subjective judgement seems fundamentally flawed.


Replies

FeepingCreaturetoday at 6:02 AM

More importantly, I suspect this actually hinders the work. If the LLM does make a mistake, it's now incentivized to downplay it instead of acknowledging and correcting.

antonvstoday at 6:02 AM

The “make no mistakes” admonition does seem pretty silly (it’s been skewered to death on yt), but… it’s easy to imagine how it might work. E.g. it could be interpreted as simply as “check your work”.

Of course, no-one seems to be (publicly) doing the comparative measurements that might allow us to reach rational conclusions here.

show 1 reply