"Crucially, it tells the agent not to rely on its internal training data (which might be hallucinated or refer to a different version of the game) but to ground its knowledge in what it observes. "
Does this even have any effect?
I'm wondering about this too. Would be nice to see an ablation here, or at least see some analysis on the reasoning traces.
It definitely doesn't wipe its internal knowledge of Crystal clean (that's not how LLMs work). My guess is that it slightly encourages the model to explore more and second-guess it's likely very-strong Crystal game knowledge but that's about it.
It's hard to say for sure because Gemini 3 was only tested with this prompt. But for Gemini 2.5, which is who the prompt was originally written for, yes this does cut down on bad assumptions (a specific example: the puzzle with Farfetch'd in Ilex Forest is completely different in the DS remake of the game, and models love to hallucinate elements from the remake's puzzle if you don't emphasize the need to distinguish hypothesis from things it actually observes).
It will definitely have some effect. Why won't it? Even adding noise into prompts (like saying you will be rewarded $1000 for each correct answer) has some effect.
Whether the 'effect' something implied by the prompt, or even something we can understand, is a totally different question.
I very much doubt it
Do we have examples of this in promps in other contexts?
I would imagine that prompting anything like this will have an excessively ironic effect like convincing it to suppress patterns which it would consider to be pre-knowledge.
If you looked inside they would be spinning on something like "oh I know this is the tile to walk on, but I have to only rely on what I observe! I will do another task instead to satisfy my conditions and not reveal that I have pre-knowledge.
LLMs are literal douche genies. The less you say, generally, the better
If they trained the model to respond to that, then it can respond to that, otherwise it can't necessarily.
It might get things wrong on purpose, but deep down it knows what it's doing
Yes, at least to some extent. The author mentions that the base model knows the answer to the switch puzzle but does not execute it properly here.
"It is worth noting that the instruction to "ignore internal knowledge" played a role here. In cases like the shutters puzzle, the model did seem to suppress its training data. I verified this by chatting with the model separately on AI Studio; when asked directly multiple times, it gave the correct solution significantly more often than not. This suggests that the system prompt can indeed mask pre-trained knowledge to facilitate genuine discovery."