Something weird here, why is it so hard to have a deterministic program capable of checking a proof or anything math related, aren't maths super deterministic when natural language is not. From first principles, it should be possible to do this without a llm verifier.
It's cool, but I genuinely cannot fathom why they are targeting natural language proofs instead of a proof assistant.
Exciting stuff from a fantastic team.
So it's designed for informal proofs and it "verifies" based on a rubric fitting function and human interaction, is that right?
What's the use case for a system like this?