logoalt Hacker News

snemvaltstoday at 4:59 AM10 repliesview on HN

Math and coding competition problems are easier to train because of strict rules and cheap verification. But once you go beyond that to less defined things such as code quality, where even humans have hard time putting down concrete axioms, they start to hallucinate more and become less useful.

We are missing the value function that allowed AlphaGo to go from mid range player trained on human moves to superhuman by playing itself. As we have only made progress on unsupervised learning, and RL is constrained as above, I don't see this getting better.


Replies

NitpickLawyertoday at 6:29 AM

> I don't see this getting better.

We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

show 9 replies
zozbot234today at 5:20 AM

This is not formally verified math so there is no real verifiable-feedback aspect here. The best models for formalized math are still specialized ones. although general purpose models can assist formalization somewhat.

anabistoday at 7:29 AM

> But once you go beyond that to less defined things such as code quality

I think they have a good optimization target with SWE-Bench-CI.

You are tested for continuous changes to a repository, spanning multiple years in the original repository. Cumulative edits needs to be kept maintainable and composable.

If there are something missing with the definition of "can be maintained for multiple years incorporating bugfixes and feature additions" for code quality, then more work is needed, but I think it's a good starting point.

jack_pptoday at 5:30 AM

Maybe to get a real breakthrough we have to make programming languages / tools better suited for LLM strengths not fuss so much about making it write code we like. What we need is correct code not nice looking code.

show 4 replies
eptcykatoday at 6:31 AM

Do we need all that if we can apply AI to solve practical problems today?

show 2 replies
otabdeveloper4today at 5:23 AM

LLMs can often guess the final answer, but the intermediate proof steps are always total bunk.

When doing math you only ever care about the proof, not the answer itself.

show 3 replies
raincoletoday at 6:13 AM

Except it's not how this specific instance works. In this case the problem isn't written in a formal language and the AI's solution is not something one can automatically verify.

pjeremtoday at 6:48 AM

I mean, even if the technology stopped to improve immediately forever (which is unlikely), LLMs are already better than most humans at most tasks.

Including code quality. Not because they are exceptionally good (you are right that they aren’t superhuman like AlphaGo) but because most humans are rather not that good at it anyway and also somehow « hallucinate » because of tiredness.

Even today’s models are far from being exploited at their full potential because we actually developed pretty much no tools around it except tooling to generate code.

I’m also a long time « doubter » but as a curious person I used the tool anyway with all its flaws in the latest 3 years. And I’m forced to admit that hallucinations are pretty rare nowadays. Errors still happen but they are very rare and it’s easier than ever to get it back in track.

I think I’m also a « believer » now and believe me, I really don’t want to because as much as I’m excited by this, I’m also pretty much frightened of all the bad things that this tech could to the world in the wrong hands and I don’t feel like it’s particularly in the right hands.

charcircuittoday at 7:45 AM

LLMs already do unsupervised learning to get better at creative things. This is possible since LLMs can judge the quality of what is being produced.

typstoday at 6:30 AM

I mean, this is why everyone is making bank selling RL environments in different domains to frontier labs.