Yes, at least to some extent. The author mentions that the base model knows the answer to the switch puzzle but does not execute it properly here.
"It is worth noting that the instruction to "ignore internal knowledge" played a role here. In cases like the shutters puzzle, the model did seem to suppress its training data. I verified this by chatting with the model separately on AI Studio; when asked directly multiple times, it gave the correct solution significantly more often than not. This suggests that the system prompt can indeed mask pre-trained knowledge to facilitate genuine discovery."
My issue with this is that the LLM could just be roleplaying that it doesn't know.