Yes, in theory. But these are inherently non-deterministic systems interpreting English prose. It's not the same thing as a real honest-to-God program that executes a deterministic algorithm to verify the output.
I can't believe we've sunk this low, to start complaining that the non-deterministic black box didn't respect "YOU MUST DO THIS" or "DO NOT DO THIS" commands in a Markdown file. We used to be engineers.