I’d be interested in the benchmarking if you ever write it up! People do seem to assume LLM as a jud...

efromvt • today at 1:57 PM • 0 replies • view on HN

I’d be interested in the benchmarking if you ever write it up! People do seem to assume LLM as a judge/panel improves outcomes (and arguably it does in cases like code review?) but I suspect it is very situational and the priors from human panel of experts don’t always translate cleanly.

alt Hacker News