logoalt Hacker News

s-macke10/11/20243 repliesview on HN

These results are very similar to the "Alice in Wonderland" problem [1, 2], which was already discussed a few months ago. However the authors of the other paper are much more critical and call it a "Complete Reasoning Breakdown".

You could argue that the issue lies in the models being in an intermediate state between pattern matching and reasoning.

To me, such results indicate that you can't trust any LLM benchmark results related to math and reasoning when you see, that changing the characters, numbers or the sentence structure in a problem alter the outcome by more than 20 percentage points.

[1] https://arxiv.org/html/2406.02061v1

[2] https://news.ycombinator.com/item?id=40811329


Replies

oliwary10/11/2024

Someone (https://x.com/colin_fraser/status/1834336440819614036) shared an example that I thought was interesting relating to their reasoning capabilities:

A man gets taken into a hospital. When the doctor sees him, he exclaims "I cannot operate on this person, he is my own son!". How is this possible?

All LLMs I have tried this on, including GPT o1-preview, get this wrong, assuming that this the riddle relates to a gendered assumption about the doctor being a man, while it is in fact a woman. However, in this case, there is no paradox - it is made clear that the doctor is a man ("he exclaims"), meaning they must be the father of the person being brought in. The fact that the LLMs got this wrong suggests that it finds a similar reasoning pattern and then applies it. Even after additional prodding, a model continued making the mistake, arguing at one point that it could be a same-sex relationship.

Amusingly, when someone on HN mentioned this example in the O1 thread, many of the HN commentators also misunderstood the problem - perhaps humans also mostly reason using previous examples rather than thinking from scratch.

show 4 replies
apsec11210/11/2024

Both Claude-3.5 and o1-preview nail this problem

"Let's think through this step-by-step:

1. Alice has 3 brothers 2. Alice has 2 sisters 3. We need to find out how many sisters Alice's brother has

The key here is to realize that Alice's brothers would have the same sisters as Alice, except they would also count Alice as their sister.

So, Alice's brothers would have: - The 2 sisters Alice has - Plus Alice herself as a sister

Therefore, Alice's brothers have 3 sisters in total."

show 2 replies
s-macke10/14/2024

Here is the larger discussion about the Alice in Wonderland Paper on Hacker News.

https://news.ycombinator.com/item?id=40585039