Code correctness should be checked automatically with the CI and testsuite. New tests should be added. This is exactly what makes sure these stupid errors don't bother the reviewer. Same for the code formatting and documentation.
What exactly is the analogy you’re suggesting, using LLMs to verify the citations?
This discussion makes me think peer reviews need more automated tooling somewhat analogous to what software engineers have long relied on. For example, a tool could use an LLM to check that the citation actually substantiates the claim the paper says it does, or else flags the claim for review.