I've seen this reply to Simon's benchmark for 2 years running now, and yet you still see i...

HaZeust • yesterday at 5:23 PM • 3 replies • view on HN

I've seen this reply to Simon's benchmark for 2 years running now, and yet you still see improvements and objectively-bad results over time from new releases, even when I'm sure every frontier AI team has/had a person at least partially dedicated to better bicycle-pelican SVG outputs. Alas.

Replies

mrandish • today at 6:32 AM

Hence it has become a meta-benchmark of relative progress in SVG image generation of a known target which has leaked into the training data and for which "every frontier AI team has/had a person at least partially dedicated to" at least checking if not optimizing.

sarreph • yesterday at 5:26 PM

I had intended to caveat that: I'm sure I'm not the first person to ask about this!

> you still see improvements

This is expected if they are training their models on it, right?

> objectively-bad results

Keen to learn when this has been the case, i.e. across version increments in major models.

➕ show 1 reply

llm_nerd • yesterday at 5:32 PM

I honestly assumed their comment was tongue in cheek humour, because positively no one actually cares how these models generate an SVG pelican riding a bicycle. It's some meme thing that this stuff always appears here.

➕ show 1 reply

alt Hacker News

Replies