solve simple maths problems, for example the kind found in the game 4=10 [1]
Doesn't necessarily have to reliably solve them, some of them are quite difficult, but llms are just comically bad at this kind of thing.
Any kind of novel-ish(can't just find the answers in the training-data) logic puzzle like this is, in my opinion, a fairly good benchmark for "thinking".
Until a llm can compete with a 10 year old child in this kind of task, I'd argue that it's not yet "thinking". A thinking computer ought to be at least that good at maths after all.
[1] https://play.google.com/store/apps/details?id=app.fourequals...
> solve simple maths problems, for example the kind found in the game 4=10
I'm pretty sure that's been solved for almost 12 months now - the current generation "reasoning" models are really good at those kinds of problems.