logoalt Hacker News

NitpickLawyer05/15/20250 repliesview on HN

I agree. At first the problems that you try to solve need to be verifiable.

But there's progress on many fronts on this. There's been increased interest in provers (natural language to lean for example). There's also been progress in LLM-as-a-judge on open-ish problems. And it seems that RL can help with extracting step rewards from sparse rewards domains.