logoalt Hacker News

hn8726today at 5:23 PM9 repliesview on HN

Genuine question, what's the goal of posting this on almost every single new model thread here on HN? I may be old and grumpy but to me it got old a while ago, and is closer to a low effort Reddit comment


Replies

lambdatoday at 5:37 PM

It's a lighthearted, fun, visual benchmark that's not part of the standard benchmarks; and at least traditionally, it was not something that the labs trained on so it was something of a measure of how well the intelligence of the model generalized. Part of the idea of LLMs is that they pick up general knowledge and reasoning ability, beyond any tasks that they are specifically trained for, from the vast quantity of data that they are trained on.

Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.

nickthegreektoday at 6:02 PM

This isn't even a normal pelican image post, this one created the html control system that animates the distance the wing travels from its pivot in time with the rotation of the wheel speed. Let's not pretend this is a solved problem and models are dumping about perfect pelicans on bikes one after another (or ever?).

Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?

walthamstowtoday at 6:44 PM

It's a great filter for people who take things far too seriously

Stromtoday at 6:50 PM

It's tradition at this point. Based on the upvotes the comment receives, it looks like many readers find value in it.

show 2 replies
rolymathtoday at 6:27 PM

I agree I'm of sick of this repetition. It's not even a good test it's so dumb.

Mashimotoday at 6:30 PM

I, for one, find it entertaining.

wotsdattoday at 5:40 PM

[dead]

snendroid-aitoday at 6:52 PM

Agreed! When I see any new model release and then this guy start running over with his stupid "hey guys look over here how this model made the pelicans-on-a-bicycle!" I mean, some are good, some are stupid and some are interesting. But that tells me exactly nothing about the model. It's just feel like this has become the Pete Davidson of the model evaluation. NO ONE CARES!

show 1 reply