Tests are only rigorous if the correct intent is encoded in them. Perfectly working software can be wrong if the intent was inferred incorrectly. I leverage BDD heavily, and there a lot of little details it's possible to misinterpret going from spec -> code. If the spec was sufficient to fully specify the program, it would be the program, so there's lots of room for error in the transformation.
> If the spec was sufficient to fully specify the program, it would be the program
Very salient concept in regards to LLM's and the idea that one can encode a program one wishes to see output in natural English language input. There's lots of room for error in all of these LLM transformations for same reason.
Then I disagree with you
> You still have to have a human who knows the system to validate that the thing that was built matches the intent of the spec.
You don't need a human who knows the system to validate it if you trust the LLM to do the scenario testing correctly. And from my experience, it is very trustable in these aspects.
Can you detail a scenario by which an LLM can get the scenario wrong?