logoalt Hacker News

GalaxyNovatoday at 7:17 AM1 replyview on HN

Because LLMs are bad at reviewing code for the same reasons they are bad at making it? They get tricked by fancy clean syntax and take long descriptions / comments for granted without considering the greater context.


Replies

colechristensentoday at 8:21 AM

I don't know, I prompted Opus 4.5 "Tell me the reasons why this report is stupid" on one of the example slop reports and it returned a list of pretty good answers.[1]

Give it a presumption of guilt and tell it to make a list, and an LLM can do a pretty good job of judging crap. You could very easily rig up a system to give this "why is it stupid" report and then grade the reports and only let humans see the ones that get better than a B+.

If you give them the right structure I've found LLMs to be much better at judging things than creating them.

Opus' judgement in the end:

"This is a textbook example of someone running a sanitizer, seeing output, and filing a report without understanding what they found."

1. https://claude.ai/share/8c96f19a-cf9b-4537-b663-b1cb771bfe3f

show 4 replies