The AlphaZero approach shows otherwise, as long as there is an automated way to generate new test cases and evaluate the outcomes.
We can't do it for all domains, but I believe we can for efficient code.
today's models could be probably already good enough to compose tasks, and evaluate the results.