logoalt Hacker News

tarrudatoday at 1:22 PM2 repliesview on HN

At this point I wouldn't be surprised if your pelican example has leaked into most training datasets.

I suggest to start using a new SVG challenge, hopefully one that makes even Gemini 3 Deep Think fail ;D


Replies

jon-woodtoday at 2:08 PM

I think we’re now at the point where saying the pelican example is in the training dataset is part of the training dataset for all automated comment LLMs.

ertgbnmtoday at 2:59 PM

I'm guessing it has the opposite problem of typical benchmarks since there is no ground truth pelican bike svg to over fit on. Instead the model just has a corpus of shitty pelicans on bikes made by other LLMs that it is mimicking.

So we might have an outer alignment failure.

show 1 reply