logoalt Hacker News

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

74 pointsby ykhliyesterday at 6:42 PM32 commentsview on HN

Comments

bubblesortingyesterday at 7:55 PM

Very cool! I am a good Tetris player (in the top 10% of players) and wanted to give brick yeeting against an LLM a spin.

Some feedback: - Knowing the scoring system is helpful when going 1v1 high score

- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)

- Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it.

- Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second

- re-mappable keys are also appreciated

Nice work, I'm going to keep watching.

show 3 replies
augusteotoday at 2:18 AM

LLMs playing Tetris feels like testing a calculator's ability to write poetry. Interesting as a curiosity, but the results don't transfer to the tasks where these models actually excel.

Curious what the latency looks like per move. That seems like the actual bottleneck here.

ykhliyesterday at 9:25 PM

Thanks for all the questions! More details on how this works:

- Each model starts with an initial optimization function for evaluating Tetris moves.

- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.

- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function

- The model generates updated code, executes it to score all placements, and picks the best move.

- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.

bityardyesterday at 10:55 PM

Looks fun, but I'm not willing to give out my email address just to play a game.

Also, if the creator is reading this, you should know that Tetris Holdings is extremely aggressive with their trademark enforcement.

vunderbayesterday at 8:53 PM

Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs?

show 2 replies
OGEnthusiastyesterday at 7:55 PM

Gemini 3 Flash is at a very nice point along the price-performance curve. A good workhorse model, while supplementing it with Opus 4.5 / Gemini 3 Pro for more complex tasks.

burkamanyesterday at 7:58 PM

It's actually 80% against Opus, 66% average against the 5 models it's tested with.

p0w3n3dyesterday at 9:47 PM

Guys, I don't know how to tell you but... Tetris can web solved without LLM...

esafakyesterday at 8:33 PM

I imagine this is because Tetris is visual and the Gemini models are strong visually.

show 1 reply
arendtioyesterday at 7:57 PM

There are some concepts clashing here.

I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.

show 2 replies
akomtuyesterday at 7:52 PM

It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.

show 3 replies
segmondyyesterday at 10:03 PM

... and what does this prove? what can you decide to use one LLM to solve over another based on this tetrisbench besides play tetris?

tiahurayesterday at 9:16 PM

I'd like to see a nethackbench.

indigodaddyyesterday at 10:26 PM

Is there a tl;dr on why this is? Does it just make faster decisions?

purplecatsyesterday at 9:59 PM

watch link?