Not anymore. This benchmark is for LLM chess ability:

dwohnitmok • yesterday at 9:43 PM • 4 replies • view on HN

Not anymore. This benchmark is for LLM chess ability: https://github.com/lightnesscaster/Chess-LLM-Benchmark?tab=r.... LLMs are graded according to FIDE rules so e.g. two illegal moves in a game leads to an immediate loss.

This benchmark doesn't have the latest models from the last two months, but Gemini 3 (with no tools) is already at 1750 - 1800 FIDE, which is approximately probably around 1900 - 2000 USCF (about USCF expert level). This is enough to beat almost everyone at your local chess club.

Replies

cesarvarela • yesterday at 10:02 PM

Yeah, but 1800 FIDE players don't make illegal moves, and Gemini does.

➕ show 2 replies

overgard • today at 1:44 AM

They have literally every chess game in existence to train on, and they can't do better than 1800?

➕ show 1 reply

runarberg • yesterday at 10:02 PM

Wait, I may be missing something here. These benchmarks are gathered by having models play each other, and the second illegal move forfeits the game. This seems like a flawed method as the models who are more prone to illegal moves are going to bump the ratings of the models who are less likely.

Additionally, how do we know the model isn’t benchmaxxed to eliminate illegal moves.

For example, here is the list of games by Gemini-3-pro-preview. In 44 games it preformed 3 illegal moves (if I counted correctly) but won 5 because opponent forfeits due to illegal moves.

https://chessbenchllm.onrender.com/games?page=5&model=gemini...

I suspect the ratings here may be significantly inflated due to a flaw in the methodology.

EDIT: I want to suggest a better methodology here (I am not gonna do it; I really really really don’t care about this technology). Have the LLMs play rated engines and rated humans, the first illegal move forfeits the game (same rules apply to humans).

➕ show 2 replies

deadbabe • yesterday at 9:57 PM

Why do we care about this? Chess AI have long been solved problems and LLMs are just an overly brute forced approach. They will never become very efficient chess players.

The correct solution is to have a conventional chess AI as a tool and use the LLM as a front end for humanized output. A software engineer who proposes just doing it all via raw LLM should be fired.

➕ show 1 reply

alt Hacker News

Replies