logoalt Hacker News

thatwasunusualtoday at 2:19 AM6 repliesview on HN

> If a model draws a really good picture of a pelican riding a bicycle there's a solid chance it will be great at all sorts of other things.

Why?

If I hired a worker that was really good at drawing pelicans riding a bike, it wouldn't tell me anything about his/her other qualities?!


Replies

theshrike79today at 11:48 AM

What if the employee can draw a bike and a pelican, but not a pelican on a bike?

suspended_statetoday at 7:22 AM

Your comment is funny, but please note: it's not drawing a pelican riding a bike, it's describing in SVG a pelican riding a bike. Your candidate would at least displays some knowledge of the SVG specs.

simonwtoday at 3:10 AM

I wish I knew why. I didn't think it would be a useful indicator of model skills at all when I started doing it, but over time the pattern has held that performance on pelican riding a bicycle is a good indicator of performance on other tasks.

falcor84today at 10:22 AM

For better or worse, a lot of job interviews actually do use contrived questions like this, such as the infamous "how many golf balls can you fit in a 747?"

vikramkrtoday at 4:43 AM

The difference is that the worker you hire would be a human being and not a large matrix multiplication that had parameters optimized by a a gradient descent process and embeds concepts in a higher dimensional vector space that results in all sorts of weird things like subliminal learning (https://alignment.anthropic.com/2025/subliminal-learning/).

It's not a human intelligence - it's a totally different thing, so why would the same test that you use to evaluate human abilities apply here?

Also more directly the "all sorts of other things" we want llms to be good at often involve writing code/spatial reasoning/world understanding which creating an svg of a pelican riding a bicycle very very directly evaluates so it's not even that surprising?

jtbakertoday at 2:53 AM

a posteriori knowledge. the pelican isn't the point, it's just amusing. the point is that Simon has seen a correlation between this skill and and the model's general capabilities.