You don’t need a test to know this we already know there’s heavy reinforcement training done on these models so it optimizes for passing the training. Passing the training means convincing the person rating the answers and that the answer is good.
The keyword is convince. So it just needs to convince people that’s it’s right.
It is optimizing for convincing people. Out of all answers that can convince people some can be actual correct answers, others can be wrong answers.
Yet people often forget this. We don't have mathematical models of truth, beauty, or many abstract things. Thus we proxy it with "I know it when I see it." It's a good proxy for lack of anything better but it also creates a known danger: the model optimizes deception. The proxy helps it optimize the answers we want but if we're not incredibly careful they also optimize deception.
This makes them frustrating and potentially dangerous tools. How do you validate a system optimized to deceive you? It takes a lot of effort! I don't understand why we are so cavalier about this.