Do you have some categories of such original problems? It seems markedly better at reasoning/logic puzzles, and programmatically-solvable problems are often offloaded to the Python interpreter.