I agree, and yet I think even with a well engineering agent harness, there are a lot of unknown unknowns out there.
I imagine the problem will persist if users continue to submit PRs that pass the harness without being able to validate for themselves that it actually works.