They are directly contradicting the claim that if you ran other models on the same codebases you would get similar results.