You didn’t do it enough. They stop finding bugs eventually. Also, different models can find different bugs (though they do find the same ones, too, which is good and expected). For best results you want to run multi model reviews in loops.
If you had multiple people look at your PRs multiple times on different days results would be very similar.
No, depending on the complexity of the issue models can be into loops, where they go "this is definitely an issue and must be fixed", and then the resulting fixed code gets "this is definitely an issue and must be fixed", and then the resulting fixed code has the original 'issue'.
I've had it find bug, I asked it to make test to trigger the bug, and then it figured out it's not a bug. It will absolutely do wish fulfilment