In which distribution? Like school math or competition or unsolved problems? FWIW I think one and three and probably easier to generated as synethetically. It's harder to bound the difficulty but I think the recent David silver talk implies it doesn't matter much. Anyway there's some work on this you can find online--they claim to improve gsm8k and MATH a bit but not saturate it. Idk in practice how useful it is