logoalt Hacker News

oceanskyyesterday at 3:23 PM9 repliesview on HN

"Crucially, it tells the agent not to rely on its internal training data (which might be hallucinated or refer to a different version of the game) but to ground its knowledge in what it observes. "

Does this even have any effect?


Replies

ragibsonyesterday at 3:33 PM

Yes, at least to some extent. The author mentions that the base model knows the answer to the switch puzzle but does not execute it properly here.

"It is worth noting that the instruction to "ignore internal knowledge" played a role here. In cases like the shutters puzzle, the model did seem to suppress its training data. I verified this by chatting with the model separately on AI Studio; when asked directly multiple times, it gave the correct solution significantly more often than not. This suggests that the system prompt can indeed mask pre-trained knowledge to facilitate genuine discovery."

show 1 reply
tootyskootyyesterday at 3:30 PM

I'm wondering about this too. Would be nice to see an ablation here, or at least see some analysis on the reasoning traces.

It definitely doesn't wipe its internal knowledge of Crystal clean (that's not how LLMs work). My guess is that it slightly encourages the model to explore more and second-guess it's likely very-strong Crystal game knowledge but that's about it.

show 1 reply
MrCheezetoday at 8:31 AM

It's hard to say for sure because Gemini 3 was only tested with this prompt. But for Gemini 2.5, which is who the prompt was originally written for, yes this does cut down on bad assumptions (a specific example: the puzzle with Farfetch'd in Ilex Forest is completely different in the DS remake of the game, and models love to hallucinate elements from the remake's puzzle if you don't emphasize the need to distinguish hypothesis from things it actually observes).

raincoleyesterday at 4:38 PM

It will definitely have some effect. Why won't it? Even adding noise into prompts (like saying you will be rewarded $1000 for each correct answer) has some effect.

Whether the 'effect' something implied by the prompt, or even something we can understand, is a totally different question.

blibbleyesterday at 3:28 PM

I very much doubt it

babyyesterday at 4:08 PM

Do we have examples of this in promps in other contexts?

elifyesterday at 6:28 PM

I would imagine that prompting anything like this will have an excessively ironic effect like convincing it to suppress patterns which it would consider to be pre-knowledge.

If you looked inside they would be spinning on something like "oh I know this is the tile to walk on, but I have to only rely on what I observe! I will do another task instead to satisfy my conditions and not reveal that I have pre-knowledge.

LLMs are literal douche genies. The less you say, generally, the better

astrangeyesterday at 4:24 PM

If they trained the model to respond to that, then it can respond to that, otherwise it can't necessarily.

show 1 reply
mkoubaayesterday at 4:10 PM

It might get things wrong on purpose, but deep down it knows what it's doing