I haven't had success in getting AI's to output working proofs.
You'd need a completely different post-training and agent stack for that.