logoalt Hacker News

aucisson_masqueyesterday at 10:13 PM1 replyview on HN

There isn't even deepseek V4.

I'd rather trust LLM arena leaderboard, which puts it on par with sonnet.


Replies

gpt5yesterday at 10:20 PM

LM Arena uses human side by side voting, which limits its applicability to complex tasks.

The ARCPrize leaderboard does have Deepseek V3.2, which only scored 4% on ARC-AGI 2 (while the top models score over 80%). It also Kimi and Qwen, but they also didn't perform well.