Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

74 points • by ykhli • yesterday at 6:42 PM • 32 comments • view on HN

Comments

bubblesorting • yesterday at 7:55 PM

Very cool! I am a good Tetris player (in the top 10% of players) and wanted to give brick yeeting against an LLM a spin.

Some feedback: - Knowing the scoring system is helpful when going 1v1 high score

- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)

- Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it.

- Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second

- re-mappable keys are also appreciated

Nice work, I'm going to keep watching.

➕ show 3 replies

augusteo • today at 2:18 AM

LLMs playing Tetris feels like testing a calculator's ability to write poetry. Interesting as a curiosity, but the results don't transfer to the tasks where these models actually excel.

Curious what the latency looks like per move. That seems like the actual bottleneck here.

ykhli • yesterday at 9:25 PM

Thanks for all the questions! More details on how this works:

- Each model starts with an initial optimization function for evaluating Tetris moves.

- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.

- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function

- The model generates updated code, executes it to score all placements, and picks the best move.

- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.

bityard • yesterday at 10:55 PM

Looks fun, but I'm not willing to give out my email address just to play a game.

Also, if the creator is reading this, you should know that Tetris Holdings is extremely aggressive with their trademark enforcement.

vunderba • yesterday at 8:53 PM

Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs?

➕ show 2 replies

OGEnthusiast • yesterday at 7:55 PM

Gemini 3 Flash is at a very nice point along the price-performance curve. A good workhorse model, while supplementing it with Opus 4.5 / Gemini 3 Pro for more complex tasks.

burkaman • yesterday at 7:58 PM

It's actually 80% against Opus, 66% average against the 5 models it's tested with.

p0w3n3d • yesterday at 9:47 PM

Guys, I don't know how to tell you but... Tetris can web solved without LLM...

esafak • yesterday at 8:33 PM

I imagine this is because Tetris is visual and the Gemini models are strong visually.

➕ show 1 reply

arendtio • yesterday at 7:57 PM

There are some concepts clashing here.

I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.

➕ show 2 replies

akomtu • yesterday at 7:52 PM

It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.

➕ show 3 replies

segmondy • yesterday at 10:03 PM

... and what does this prove? what can you decide to use one LLM to solve over another based on this tetrisbench besides play tetris?

tiahura • yesterday at 9:16 PM

I'd like to see a nethackbench.

indigodaddy • yesterday at 10:26 PM

Is there a tl;dr on why this is? Does it just make faster decisions?

purplecats • yesterday at 9:59 PM

watch link?

alt Hacker News

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

Comments