The question isn't about universal novelty, but whether the prompt/context is novel enough...

hackinthebochs • last Tuesday at 12:43 AM • 1 reply • view on HN

The question isn't about universal novelty, but whether the prompt/context is novel enough such that the LLM answering competently demonstrates understanding. The claim of parroting is that the dataset contains a near exact duplicate of any prompt and so the LLM demonstrating what appears to be competence is really just memorization. But if an LLM can generalize from an elephant mirror test to an LLM mirror test in an entirely new context (showing pictures and being asked to describe it), that demonstrates sufficient generalization to "understand" the concept of a mirror test.

Replies

MichaelZuo • last Wednesday at 9:27 PM

How do you know it’s the one generalizing?

Likely there has been at least one text that already does that for say dolphin mirror tests or chimpanzee mirror teats.

alt Hacker News

Replies