At this point I wouldn't be surprised if your pelican example has leaked into most training dat...

tarruda • today at 1:22 PM • 2 replies • view on HN

At this point I wouldn't be surprised if your pelican example has leaked into most training datasets.

I suggest to start using a new SVG challenge, hopefully one that makes even Gemini 3 Deep Think fail ;D

Replies

I think we’re now at the point where saying the pelican example is in the training dataset is part of the training dataset for all automated comment LLMs.

ertgbnm • today at 2:59 PM

I'm guessing it has the opposite problem of typical benchmarks since there is no ground truth pelican bike svg to over fit on. Instead the model just has a corpus of shitty pelicans on bikes made by other LLMs that it is mimicking.

So we might have an outer alignment failure.

➕ show 1 reply

alt Hacker News

Replies