logoalt Hacker News

grahamplaceyesterday at 5:11 PM1 replyview on HN

See: https://lmarena.ai/leaderboard


Replies

jasonjmcgheeyesterday at 5:18 PM

Unless you overfit to benchmark style scenarios and are worse for real-world use.