logoalt Hacker News

sempron64yesterday at 5:44 PM6 repliesview on HN

The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.


Replies

tripleeeyesterday at 6:03 PM

I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.

show 4 replies
Fuzzwahtoday at 3:49 AM

The "big beak!" comment in the svg source makes me think it's definitely a gamed "benchmark" at this point.

h4nyyesterday at 7:21 PM

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

show 1 reply
kaygeyesterday at 8:28 PM

Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.

stratos123yesterday at 10:18 PM

I'd be very surprised if this is in the training data given that most models mess it up to this day. E.g. look at the ones from Opus.

quantumwokeyesterday at 6:40 PM

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.

show 1 reply