> Demonstrably working, as in you can prove the code actually works by then putting it to use.
That's not how you prove that code works properly and isn't going to fail due to some obscure or unforessen corner case. You need actual proof that's driven by the code's overall structure. Humans do this at least informally when they code, AI's can't do that with any reliability, especially not for non-trivial projects (for reasons that are quite structural and hard to change) so most coding agents simply work their way iteratively to get their test results to pass. That's not a robust methodology.
>That's not how you prove that code works properly and isn't going to fail due to some obscure or unforessen corner case.
So? We didn't prove human code "isn't going to fail due to some obscure or unforessen corner case" either (aside the tiny niche of formal verification).
So from that aspect it's quite similar.
>so most coding agents simply work their way iteratively to get their test results to pass. That's not a robust methodology.
You seem to imply they do some sort of random iteration until the tests pass, which is not the case. Usually they can see the test failing, and describe the issue exactly in the way a human programmer would, then fix it.
> That's not how you prove that code works properly
Yes it is. What do you expect, formal verification of a toy GUI library? Get real.
> and isn't going to fail due to some obscure or unforessen corner case.
That's called "a bug", they get fixed when they're found. This isn't aerospace software, failure is not only an option, it's an expected part of the process.
> You need actual proof that's driven by the code's overall structure.
I literally don't.
> Humans do this at least informally when they code, AI's can't do that with any reliability
Sounds like a borderline theological argument. Coding agents one-shot problems a lot more often than I ever did. Results are what matters, demonstrable results.