ok! So if someone uses an existing, checkpointed, open source model then the answer is yes the results are valid and it doesn't matter that the tests are public.
Yes, assuming the checkpoint was before the announcement & public availability of the test set.
Yes, assuming the checkpoint was before the announcement & public availability of the test set.