That seems unrelated? I think we are talking about past each other. Phi was trained on purely synthetic data derived from emulating the benchmark suite. Not surprisingly, this resulted in state of the art scores. And a model that was 100% useless at anything other than making the benchmark number go up.