logoalt Hacker News

Gemini 3 Deep Think drew me a good SVG of a pelican riding a bicycle

107 pointsby staredtoday at 7:47 PM45 commentsview on HN

Comments

segmondytoday at 8:20 PM

For those claiming they rigged it. Do you have any concrete evidence? What if the models have just gotten really good?

I just asked Gemini pro to generate an SVG of an octopus dunking a basketball and it did a great job. Not even Deep Think model. Then I did "generate an svg of raccoon at a beach drinking a beer" you can go try this out yourself. Ask it to generate anything you want in SVG. use your imagination.

Rant: This is why AI is going to take over, folks are not even trying the least.

show 7 replies
vessenestoday at 8:07 PM

Simon notes this benchmark is win-win, since he loves pictures of pelicans riding bicycles — if they spend time benchmaxxing it’s like free pelicans for him.

He originally promised to generate a bunch more animals when we got a “good” pelican. This is not a good pelican. This is an OUTSTANDING pelican, a great bicycle, and it even has a little sun ray over the ocean marked out. I’d like to see more animals please Simon!

show 3 replies
rustyhancocktoday at 7:58 PM

The intensity of competition between models is so intense right now they are definitely benchmaxxing pelican on bike SVGs and Will Smith spaghetti dinner videos.

show 5 replies
rcarmotoday at 8:06 PM

I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now.

show 1 reply
WarmWashtoday at 9:03 PM

Are AI labs training on the bike Pelican?

From the blog:

>The strongest argument is that they would get caught. If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices. If those are notably worse it’s going to be pretty obvious what happened.

He mentioned in the Deep Think thread the other day that his secret test set also was impressive.

alestainertoday at 8:51 PM

Interesting thing: I've got my internal request that is similar to this pelican. And there was 0 progress on it in the past ~2 years. Which might have at least a couple of explanations. 1. Spillage into the pre-training: some real artist had drawn a pelican riding a bicycle. 2. Seeing it as an important discourse for model intelligence in the training data might affect allocation of compute into solving this problem, either thru engineers or the model itself finding the texts about this challenge.

Springtimetoday at 8:45 PM

I have wondered if with these tests it'll reach a point where online models cheat by generating a line art raster reference then behind the scenes deciding how to vectorize it in the most minimalist way (eg: using strokes and shape elements, etc, rather than naively using path outlines for all forms).

show 1 reply
aidostoday at 8:07 PM

The bicycles are getting pretty cyclable now. I’m enjoying this pelican that’s already sliced and ready to bbq.

bfungtoday at 8:34 PM

In the spirit of Winter Olympics, I vote “Lion on a bobsled” next bench . :)

stephc_int13today at 8:37 PM

Many tests are asymmetrical. They can reliably show an issue/abnormality but they are a lot less reliable on the other side of the curve.

manojldstoday at 8:14 PM

It's funny how I can know where the post is from just by looking at the title (and it's not just about pelicans)

tylervigentoday at 8:45 PM

That’s among the most artistic SVGs I’ve ever seen, period.

throwaway333444today at 8:05 PM

Since it’s a* FAQ… Also that pelican is pretty fly

show 1 reply
kittbuildstoday at 8:14 PM

SVG generation is a surprisingly good benchmark for spatial reasoning because it forces the model to work in a coordinate system with no visual feedback loop. You have to hold a mental model of what the output looks like while emitting raw path data and transforms. It's closer to how a blind sculptor works than how an image diffusion model works.

What I find interesting is that Deep Think's chain-of-thought approach helps here — you can actually watch it reason about where the pedals should be relative to the wheels, which is something that trips up models that try to emit the SVG in one shot. The deliberative process maps well to compositional visual tasks.

bulletsvshumanstoday at 8:06 PM

They rigged it.

show 1 reply