logoalt Hacker News

magniotoday at 5:05 AM1 replyview on HN

I saw on Twitter that in an ML course at Tsinghua University, one of the tests asks students to write quizzes that fail the most LLM models as possible.

What if we create a benchmark that works like this and assigns ELO scores? Models fight head-to-head by writing a question, a bug, or an incomplete implementation, which the opponent has to answer, fix, or finish.


Replies

vincnetastoday at 5:39 AM

We could call this "generative adversarial network" (GAN) :)

https://en.wikipedia.org/wiki/Generative_adversarial_network

show 1 reply