logoalt Hacker News

amarantlast Monday at 7:21 PM1 replyview on HN

solve simple maths problems, for example the kind found in the game 4=10 [1]

Doesn't necessarily have to reliably solve them, some of them are quite difficult, but llms are just comically bad at this kind of thing.

Any kind of novel-ish(can't just find the answers in the training-data) logic puzzle like this is, in my opinion, a fairly good benchmark for "thinking".

Until a llm can compete with a 10 year old child in this kind of task, I'd argue that it's not yet "thinking". A thinking computer ought to be at least that good at maths after all.

[1] https://play.google.com/store/apps/details?id=app.fourequals...


Replies

simonwlast Monday at 7:39 PM

> solve simple maths problems, for example the kind found in the game 4=10

I'm pretty sure that's been solved for almost 12 months now - the current generation "reasoning" models are really good at those kinds of problems.

show 1 reply