logoalt Hacker News

efromvttoday at 1:57 PM0 repliesview on HN

I’d be interested in the benchmarking if you ever write it up! People do seem to assume LLM as a judge/panel improves outcomes (and arguably it does in cases like code review?) but I suspect it is very situational and the priors from human panel of experts don’t always translate cleanly.