You get AIs to prove their code is correct in precisely the same ways you get humans to prove their code is correct. You make them demonstrate it through tests or evidence (screenshots, logs of successful runs).