You can let them play complete-information games (1 or 2 player) with randomly created rulesets. It&...

c7b • 01/03/2026 • 0 replies • view on HN

You can let them play complete-information games (1 or 2 player) with randomly created rulesets. It's very objective, but the thing is that anything can be optimized for. This benchmark would favor models that are good at logic puzzles / chess-style games, possibly at the expense of other capabilities.

alt Hacker News