Another project without running real benchmarks. It's very easy to generate tokens, it's much harder to solve tasks locally.