If some of those input-output pairs are the result of a different interpretation of the spec to other input-output pairs, it's possible that no program satisfies all the tests (or, worse, that a program that satisfies all the tests isn't correct).