>And in any case, I think OpenAI’s o1 models are crushing it in math right now. My man, it cann...

andrepd • 10/11/2024 • 3 replies • view on HN

>And in any case, I think OpenAI’s o1 models are crushing it in math right now.

My man, it cannot solve even the simplest problems which it hasn't seen the solution to yet, and routinely makes elementary errors in simple algebraic manipulations or arithmetic! All of this points to the fact that it cannot actually perform mathematical or logical reason, only mimic it superficially if trained in enough examples.

I challenge you to give it even a simple, but original, problem to solve.

Replies

Workaccount2 • 10/11/2024

>I challenge you to give it even a simple, but original, problem to solve.

(34903173/x)+(238 * 2650) - 323326 = 45323434, solve for x

Statistically, no one has ever done this calculation ever before. It's entirely unique.

O1 answered "x = 34,903,173 divided by 45,016,060", which is correct.[1][2]

Now I guess you can pick up the goal post and move it.

[1]https://chatgpt.com/share/6709481a-3144-8004-a7fd-0ccd9e3bc5...

[2]https://www.wolframalpha.com/input?i=%2834903173%2Fx%29%2B%2...

➕ show 2 replies

WhitneyLand • 10/11/2024

Please provide your precise definitions of “reasoning” and “original”.

There’s no consensus in the literature on what these mean even if you make it more specific by talking about “mathematical reasoning”, so I don’t really understand what opinions like these are based on.

I see a lot of no true Scottsman fallacy going around, even the paper resorts to this as it actually uses phrases like “true reasoning” several times.

I don’t think the paper is very convincing btw, the abstract is kind of click-baity and talks about 65% variation when that was a cherry picked example from a tiny phi model and the SOTA models showed way less variation which was arguably not that interesting.

➕ show 1 reply

ukuina • 10/11/2024

Do you have some categories of such original problems? It seems markedly better at reasoning/logic puzzles, and programmatically-solvable problems are often offloaded to the Python interpreter.

alt Hacker News

Replies