Future models know it now, assuming they suck in mastodon and/or hacker news. Although I don&...

jasongi • today at 10:27 AM • 1 reply • view on HN

Future models know it now, assuming they suck in mastodon and/or hacker news.

Although I don't think they actually "know" it. This particular trick question will be in the bank just like the seahorse emoji or how many Rs in strawberry. Did they start reasoning and generalising better or did the publishing of the "trick" and the discourse around it paper over the gap?

I wonder if in the future we will trade these AI tells like 0days, keeping them secret so they don't get patched out at the next model update.

Replies

Filligree • today at 12:12 PM

The answer can be “both”.

They won’t get this specific question wrong again; but also they generalise, once they have sufficient examples. Patching out a single failure doesn’t do it. Patch out ten equivalent ones, and the eleventh doesn’t happen.

➕ show 1 reply

alt Hacker News

Replies