Tests can only prove the presence of bugs, but not their absence. If the AI can access the tests, it can easily make them pass by just adding additional if statements. It doesn't mean the code is actually correct.