Interesting, but couldn't a model "cheat" in this task by being very good at telling ...

vintermann • last Friday at 12:32 PM • 1 reply • view on HN

Interesting, but couldn't a model "cheat" in this task by being very good at telling model outputs apart? How far do you get with a classifier simply trained to distinguish models by their output?

It seems to me many models - maybe by design - have a recognizable style which would be much easier to detect than evaluating the factual quality of answers.

Replies

nestorD • last Friday at 6:58 PM

In theory, yes! If this metric ever becomes a widely used standard, one would have to start accounting for that...

But, in practice, when asking a model to pick the best answer they see a single question / answers pair and focus on determining what they think is best.

alt Hacker News

Replies