logoalt Hacker News

nestorDlast Friday at 6:58 PM0 repliesview on HN

In theory, yes! If this metric ever becomes a widely used standard, one would have to start accounting for that...

But, in practice, when asking a model to pick the best answer they see a single question / answers pair and focus on determining what they think is best.