I don’t know much about AI, but I have this image test that everything has failed at. You basically just present an image of a maze and ask the LLM to draw a line through the most optimal path.
Here’s how Nano Banana fared: https://x.com/danielvaughn/status/1971640520176029704?s=46
In fact, one of the tests I use as part of GenAI Showdown involves both parts of the puzzle: draw a maze with a clearly defined entrance and exit, along with a dashed line indicating the solution to the maze.
Only one model (gpt-image-1) out of the 18 tested managed to pass the test successfully. Gemini 3.0 Pro got VERY close.
I have also tried the maze from a photo test a few times and never seen a one-shot success. But yesterday I was determined to succeed so I allowed Gemini 3 to write a python gui app that takes in photos of physical mazes (I have a bunch of 3d printed ones) and find the path. This does work.
Gemini 3 then one-shot ported the whole thing (which uses CV py libraries) to a single page html+js version which works just as well.
I gave that to Claude to assess and assign a FAANG hiring level to, and it was amazed and said Gemini 3 codes like an L6.
Since I work for Google and used my phone in the office to do this, I think I can't share the source or file.
I tried this with Claude:
``` > [Image #1] Create a unicode "ascii-art" version of this image, with the optimal path through the maze highlighted in a solid colour.
I'll create an ASCII art version of this maze with the solution path highlighted!
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
●●│ │ │ │ │ │ │ │ │ │ │ │ │ │
├─┤●└─┴─┐ ├───┐ │ ╔═╗ ├─────┤ ╔═══╝ │ ╔═╝ ╔═╝ │ │ ╔═╝ ├─┤
│ │●●●●●└─┤ │ │ ║ │ │ │ ║ │ ║ ║ │ │ ║ │ │
│ └───┐●╔═╧═╗ │ └─╢ │ │ ┌─╔═╝ ║ ┌───┤ ╚═╗ ╚═╗ │ └─╢ ┌─┘ │
│ │●║ ║ │ ║ │ │ │ ║ ║ │ │ ║ ║ │ ║ │ │
├───┐ │●╚═╗ ╚═╧═╗ ║ └─┤ │ ║ ╔═╝ └─╔═╧═╗ ╚═╗ ╚═╧═╗ ║ └─┬─┤
│ │ │●●●└───┐ ║ ║ │ │ ║ ║ ║ ║ ║ ║ ║ │ │
│ ╔═╝ └───┐●╔═╝ ║ ╚═╗ │ └─╢ ╚═╗ ╔═╝ ╔═╝ ┌─╢ ╔═══╝ ╚═╗ │ │
│ ║ │●║ ║ ║ │ ║ ║ ║ ║ │ ║ ║ ║ │ │
│ ╚═══╗ ╔═╝●╚═╗ ╚═╗ ║ ├─╔═╧═╗ ║ ║ ╔═╝ ╔═╝ ║ ║ ╔═╗ ╔═╝ │ │
│ ║ ║●●●●●│ ║ ║ │ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║ │ │
├───╔═╝ ║●┌─┐●└─╗ ║ └─┤ ║ ╔═╝ ║ ╚═╝ ╔═╧═╗ ║ ╚═╝ ║ ╚═╗ │ │
│ ║ ║●│ │●●●║ ║ │ ║ ║ ║ ║ ║ ║ ║ ║ │ │
│ ╔═╝ ╔═╝●│ └─┐●╚═╧═╗ │ ║ ║ ╔═╧═══╗ ║ ╔═╝ ╚═══╗ ╚═╗ ║ └─┤
│ ║ ║●●●│ │●●●●●║ │ ║ ║ ║ ║ ║ ║ ║ ║ ║ │
│ ║ ╔═╝●┌─┴─╗ └───┐●╚═╝ ║ ║ ║ ╔═╗ ║ └─╢ ╔═══╗ ╚═╗ ╚═╝ ╔═╝
│ ║ ║●●●│ ║ │●●●●●║ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║
│ ╚═╝●╔═╧═╗ ╚═╗ ╔═╧═══┐●╚═╝ ║ ║ ║ ╚═╗ ║ ║ ╔═╧═╗ ╚═══╗ ╚═╗
│ ●●║ ║ ║ ║ │●●●●●║ ║ ║ ║ ║ ║ ║ ║ ║ ║
├─╔═┘●╚═╗ ╚═╗ ║ ║ ╔═╗ └───┐●╚═╝ ╚═╗ ║ └─╢ ║ ╔═╧═╗ ╔═╝ ╔═╝
│ ║●●●●●║ ║ ║ ║ ║ ║ │●●●●●●●║ ║ ║ ║ ║ ║ ║ ║
│ ║●┌─╗ ╚═╗ ║ └─╢ ║ ╚═══╗ └─────┐●╚═╧═╗ ║ ║ ║ ╔═╝ ║ ╔═╝ ╔
│ ║●│ ║ ║ ║ ║ ║ ║ │●●●●●║ ║ ║ ║ ║ ║ ║ ║
│ ╚═╝ ║ ╔═╝ ╚═╗ ║ ╚═╗ ╔═╧═════╗ └───┐●╚═╝ ║ ╚═╝ ╔═╝ ║ ╔═╝
│ ║ ║ ║ ║ ║ ║ ║ │●●●●●║ ║ ║ ║
└─────╨─╨─────╨─╨───╨─╨───────╨─────┴───●●╨─────╨───╨─╨──
The path from the mouse () to the cheese () is marked with ● characters. This shows the optimal route through the maze, navigating through the corridors while avoiding all the walls!
```Suffice to say, it didn't do either part right.
That might be an interesting training set, a bunch of mazes…
The reason is that image generators don't iterate on the output in the same way the text-based LLMs do. Essentially they produce the image in "one hit" and can't solve a complex sequence in the same way you couldn't one-shot this either. Try taking a random maze, glance at it, then go off to draw a squiggle on a transparency. If you were to place that on top of the maze, there's virtually no chance that you'd have found the solution on the first try.
That's essentially what's going on with AI models, they're struggling because they only get "one step" to solve the problem instead of being able to trace through the maze slowly.
An interesting experiment would be to ask the AI to incrementally solve the maze. Ask it to draw a line starting at the entrance a little ways into the maze, then a little bit further, etc... until it gets to the end.
I just oneshot it with claude code (opus 4.5) using this prompt. It took about 5 mins and included detecting that it was cheating at first (drew a line around the boundary of the maze instead), so it added guardrails for that:
```
Create a devenv project that does the following:
Use whatever lib/framework is most appropriate```