I ran these in LM Studio and got unrecognizable pelicans out of the 2B and 4B models and an outstanding pelican out of the 26b-a4b model - I think the best I've seen from a model that runs on my laptop.
https://simonwillison.net/2026/Apr/2/gemma-4/
The gemma-4-31b model is completely broken for me - it just spits out "---\n" no matter what prompt I feed it. I got a pelican out of it via the AI Studio API hosted model instead.
Your posting of the pelican benchmark is honestly the biggest reason I check the HackerNews comments on big new model announcements
I'd recommend using the instruction tuned variants, the pelicans would probably look a lot better.
Mind I ask what your laptop is and configuration hardware wise?
Do you have a single gallery page where we can see all the pelicans together. I'm thinking something similar to
https://clocks.brianmoore.com/
but static.