It passed all the tests.
If you can't trust your test suite to catch an automatic language translation you shouldn't trust it at all. :)
What if we only trusted the test suite a reasonable amount, instead of pretending trust must either be blindly total or nonexistent?
It also modified many of the tests to make them pass in mischievous ways. You can't trust a test suite to catch regressions if the new version doesn't use the same test suite.
Tests can only prove the presence of bugs, but not their absence. If the AI can access the tests, it can easily make them pass by just adding additional if statements. It doesn't mean the code is actually correct.