>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions
So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.
Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.
And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?
There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.