Mentioned this in another comment, but the spec conformance suite is not representative of the things users care about (nor is it meant to be).
The spec mostly concerns itself with the semantics of annotations, not diagnostics or inference. I don't really recommend using it as the basis for choosing a type checker.
(I was on the Python Typing Council and helped put together the spec, the conformance test suite, etc)