Math and coding competition problems are easier to train because of strict rules and cheap verificat...

snemvalts • today at 4:59 AM • 10 replies • view on HN

Math and coding competition problems are easier to train because of strict rules and cheap verification. But once you go beyond that to less defined things such as code quality, where even humans have hard time putting down concrete axioms, they start to hallucinate more and become less useful.

We are missing the value function that allowed AlphaGo to go from mid range player trained on human moves to superhuman by playing itself. As we have only made progress on unsupervised learning, and RL is constrained as above, I don't see this getting better.

Replies

NitpickLawyer • today at 6:29 AM

> I don't see this getting better.

We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

➕ show 9 replies

zozbot234 • today at 5:20 AM

This is not formally verified math so there is no real verifiable-feedback aspect here. The best models for formalized math are still specialized ones. although general purpose models can assist formalization somewhat.

anabis • today at 7:29 AM

> But once you go beyond that to less defined things such as code quality

I think they have a good optimization target with SWE-Bench-CI.

You are tested for continuous changes to a repository, spanning multiple years in the original repository. Cumulative edits needs to be kept maintainable and composable.

If there are something missing with the definition of "can be maintained for multiple years incorporating bugfixes and feature additions" for code quality, then more work is needed, but I think it's a good starting point.

jack_pp • today at 5:30 AM

Maybe to get a real breakthrough we have to make programming languages / tools better suited for LLM strengths not fuss so much about making it write code we like. What we need is correct code not nice looking code.

➕ show 4 replies

eptcyka • today at 6:31 AM

Do we need all that if we can apply AI to solve practical problems today?

➕ show 2 replies

otabdeveloper4 • today at 5:23 AM

LLMs can often guess the final answer, but the intermediate proof steps are always total bunk.

When doing math you only ever care about the proof, not the answer itself.

➕ show 3 replies

raincole • today at 6:13 AM

Except it's not how this specific instance works. In this case the problem isn't written in a formal language and the AI's solution is not something one can automatically verify.

pjerem • today at 6:48 AM

I mean, even if the technology stopped to improve immediately forever (which is unlikely), LLMs are already better than most humans at most tasks.

Including code quality. Not because they are exceptionally good (you are right that they aren’t superhuman like AlphaGo) but because most humans are rather not that good at it anyway and also somehow « hallucinate » because of tiredness.

Even today’s models are far from being exploited at their full potential because we actually developed pretty much no tools around it except tooling to generate code.

I’m also a long time « doubter » but as a curious person I used the tool anyway with all its flaws in the latest 3 years. And I’m forced to admit that hallucinations are pretty rare nowadays. Errors still happen but they are very rare and it’s easier than ever to get it back in track.

I think I’m also a « believer » now and believe me, I really don’t want to because as much as I’m excited by this, I’m also pretty much frightened of all the bad things that this tech could to the world in the wrong hands and I don’t feel like it’s particularly in the right hands.

charcircuit • today at 7:45 AM

LLMs already do unsupervised learning to get better at creative things. This is possible since LLMs can judge the quality of what is being produced.

typs • today at 6:30 AM

I mean, this is why everyone is making bank selling RL environments in different domains to frontier labs.

alt Hacker News

Replies