logoalt Hacker News

nycdatasci02/20/20250 repliesview on HN

I think a more plausible path to gaming benchmarks would be to use watermarks in text output to identify your model, then unleash bots to consistently rank your model over opponents.