logoalt Hacker News

mohsen1today at 2:45 AM1 replyview on HN

I made Mafia Arena as a way of measuring how good each LLM is at playing Mafia/Werewolves

https://mafia-arena.com

This is a good benchmark for how good AIs are at lying


Replies

littlestymaartoday at 8:32 AM

Something is off with the numbers. GPT-5.2 cannot have a 75% winrate with one win over GLM-4.7 and a 2/10 record against Gemmini 3 Flash.