logoalt Hacker News

“Erdos problem #728 was solved more or less autonomously by AI”

590 pointsby cod1ryesterday at 10:39 PM335 commentsview on HN

Comments

maxwells-daemontoday at 1:03 AM

I work at Harmonic, the company behind Aristotle.

To clear up a few misconceptions:

- Aristotle uses modern AI techniques heavily, including language modeling.

- Aristotle can be guided by an informal (English) proof. If the proof is correct, Aristotle has a good chance at translating it into Lean (which is a strong vote of confidence that your English proof is solid). I believe that's what happened here.

- Once a proof is formalized into Lean (assuming you have formalized the statement correctly), there is no doubt that the proof is correct. This is the core of our approach: you can do a lot of (AI-driven) search, and once you find the answer you are certain it's correct no matter how complex the solution is.

Happy to answer any questions!

show 13 replies
MyFirstSasstoday at 12:21 AM

Based on Tao’s description of how the proof came about - a human is taking results backwards and forwards between two separate AI tools and using an AI tool to fill in gaps the human found?

I don’t think it can really be said to have occurred autonomously then?

Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.

EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

show 9 replies
observationistyesterday at 11:24 PM

Reconfiguring existing proofs in ways that have been tedious or obscured from humans, or using well framed methods in novel ways, will be done at superhuman speeds, and it'll unlock all sorts of capabilities well before we have to be concerned about AGI. It's going to be awesome to see what mathematicians start to do with AI tools as the tools become capable of truly keeping up with what the mathematicians want from the tools. It won't necessarily be a huge direct benefit for non-mathematicians at first, because the abstract and complex results won't have direct applications, but we might start to see millenium problems get taken down as legitimate frontier model benchmarks.

Or someone like Terence Tao might figure out how to wield AI better than anyone else, even the labs, and use the tools to take a bunch down at once. I'm excited to see what's coming this year.

show 5 replies
svattoday at 12:00 AM

For context, Terence Tao started a wiki page titled “AI contributions to Erdős problems”: https://github.com/teorth/erdosproblems/wiki/AI-contribution... (as mentioned in an earlier post https://mathstodon.xyz/@tao/115818402639190439) — even relative to when he started this page less than two weeks ago (Dec 31), the current result (for problem [728]) represents a milestone: it is the first green in Section 1 of that wiki page.

show 1 reply
lwansbroughyesterday at 11:43 PM

Can anyone with specific knowledge in a sophisticated/complex field such as physics or math tell me: do you regularly talk to AI models? Do feel like there's anything to learn? As a programmer, I can come to the AI with a problem and it can come up with a few different solutions, some I may have thought about, some not.

Are you getting the same value in your work, in your field?

show 11 replies
MORPHOICEStoday at 9:22 AM

I have kept track of a few instances where AI has been applied to real and genuine problems. ~

Not trivial problems. Issues with possible solutions, errors, and unresolved history.

AI did not \\"solve\\" any issues on its own, but what stood out to me was the speed at which concepts could be rewritten, restructured and tested for stress.

A mental model that has been useful to me is that AI is not particularly good at providing the first answer, however, it is very good at providing the second, third, and tenth versions of the answer, especially when the first answer has already been identified as weak by a human.

In these instances, the progress seemed to stem from the AI being able to: Quickly reword and restate a given argument. Convert implicit assumptions into explicit ones. Identify small gaps in logic before they became large.

What I have been grappling with is how to differentiate when AI is just clarifying versus when it is silently hallucinating structure. Is the output of AI being treated as a draft, a reviewer, a rubber duck, or some combination? When is the output so fast that the rigor of thought is compromised? I am interested in how others are using AI for hard thinking and not just for writing cleanup.

tachimyesterday at 11:41 PM

You can try out Aristotle yourself today https://aristotle.harmonic.fun/. No more waitlist!

show 3 replies
D-Machineyesterday at 11:30 PM

This is great, there is still so much potential in AI once we move beyond LLMs to specialized approaches like this.

EDIT: Look at all the people below just reacting to the headline and clearly not reading the posts. Aristotle (https://arxiv.org/abs/2510.01346) is key here folks.

EDIT2: It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.

show 5 replies
markusdetoday at 12:38 AM

Very cool to see how far things have come with this technology!

Please remember that this is a theorem about integers that is subject to a fairly elementary proof that is well-supported by the existing Mathlib infrastructure. It seems that the AI relies on the symbolic proof checker, and the proofs that it is checking don't use very complex definitions in this result. In my experience, proofs like this which are one step removed from existing infra are much much more likely to work.

Again though, this is really insanely cool!!

libraryofbabeltoday at 12:17 AM

2026 should be interesting. This stuff is not magic, and progress is always going to be gradual with solutions to less interesting or "easier" problems first, but I think we're going to see more milestones like this with AI able to chip away around the edges of unsolved mathematics. Of course, that will require a lot of human expertise too: even this one was only "solved more or less autonomously by AI (after some feedback from an initial attempt)".

People are still going to be moving the goalposts on this and claiming it's not all that impressive or that the solution must have been in the training data or something, but at this point that's kind of dubiously close to arguing that Terence Tao doesn't know what he's talking about, which to say the least is a rather perilous position.

At this point, I think I'm making a belated New Years resolution to stop arguing with people who are still staying that LLMs are stochastic parrots that just remix their training data and can never come up with anything novel. I think that discussion is now dead. There are lots of fascinating issues to work out with how we can best apply LLMs to interesting problems (or get them to write good code), but to even start solving those issues you have to at least accept that they are at least somewhat capable of doing novel things.

In 2023 I would have bet hard against us getting to this point ("there's no way chatbots can actually reason their way through novel math!"), but here we are are three years later. I wonder what comes next?

show 3 replies
snowmobiletoday at 9:57 AM

Digging through the PDFs on Google Drive, this seems to be (one of) the generated proofs. I may be misunderstanding something, but 1400 lines of AI-generated code seems a very good place for some mistake in the translation to sneak in https://github.com/plby/lean-proofs/blob/main/src/v4.24.0/Er...

Though I suppose if the problem statement in Lean is human-generated and there are no ways to "cheat" in a Lean proof, the proof could be trusted without understanding it

aziis98today at 9:40 AM

The erdos problem website tells the theorem is formalized in Lean but on the mathlib project there is just the theorem statement with a sorry. Does someone know where I can find the lean proof? I don't know maybe it's in some random pull request I didn't find.

Edit: Found it here https://github.com/plby/lean-proofs/blob/main/src/v4.24.0/Er...

zkmontoday at 8:28 AM

When Deep Blue beat Kaspaorov, it was not the end of career for human players. But since mathematics is not a sport with human players, what are the career prospects for mathematicians or mathematics-like fields?

show 2 replies
cultofmetatrontoday at 12:21 AM

I remember seeing a documentary where there was a bit about some guy who' life's work was computing pi to 30 digits. Imagine all that time to do what my computer can do in less than a second + a day or two to write the code using the algorithm he used. 10 min if you use newton's

show 1 reply
dylanztoday at 3:34 PM

Does it work on cryptography? Can it find out the methods behind the fourth Kryptos problem?

thomasahletoday at 12:19 AM

It took Andrew Wiles 7 years of intense work to solve Fermat's Last Theorem.

The METR institute predicts that the length of tasks AI agents can complete doubles every 7 months.

We should expect it to take until 2033 before AI solves Clay Institute-level problems with 50% reliability.

show 3 replies
dnwtoday at 12:54 AM

I really want to see if someone can prompt out a more elegant proof of Fermat's Last Theoremthan, compared to that of Wiles's proof.

esafaktoday at 12:24 AM

How are academics going to assess AI-coauthored research for appointment and promotion?

show 1 reply
emsigntoday at 6:07 AM

Sounds to me the actual work was done in the discussions with ChatGPT by the researchers.

kittikittitoday at 4:31 PM

This is a great achievement for AI! I quickly read through the thread but found that Tao's page on Github to be easier to comprehend,

https://github.com/teorth/erdosproblems/wiki/AI-contribution...

It classifies the advancements based on the level of AI input. In particular, the entry in Table 1 related to the original post has both a green and yellow light, reflecting the skepticism from others.

shevy-javatoday at 7:50 AM

Skynet 3.0 is annoying.

show 1 reply
maximgeorgetoday at 7:45 AM

[dead]

bgwalteryesterday at 11:47 PM

[flagged]

show 7 replies
leggothrowtoday at 12:08 AM

This almost implies mathematicians aren’t some ungodly geniuses if something as absolutely dumb as an LLM can solve these problems via blind pattern matching.

Meanwhile I can’t get Claude code to fix its own shit to save my life.

show 4 replies
muldvarptoday at 3:53 PM

Everyone who works for a living is about to have a really bad time.