Disagree is such a loose/wimpy study. Add in a grounded/expected response, and then it bec...

anilgulecha • today at 2:01 PM • 1 reply • view on HN

Disagree is such a loose/wimpy study. Add in a grounded/expected response, and then it becomes a better benchmark (because it'll force the author to actually think about choices presented to the LLM).

Replies

kostaj • today at 2:09 PM

Will add a human-labelled expected response and measure against it in a follow up research. This one only captures the disagreement between the models, but not which model is write/wrong.

alt Hacker News

Replies