Both Claude-3.5 and o1-preview nail this problem
"Let's think through this step-by-step:
1. Alice has 3 brothers 2. Alice has 2 sisters 3. We need to find out how many sisters Alice's brother has
The key here is to realize that Alice's brothers would have the same sisters as Alice, except they would also count Alice as their sister.
So, Alice's brothers would have: - The 2 sisters Alice has - Plus Alice herself as a sister
Therefore, Alice's brothers have 3 sisters in total."
My problem with this puzzle, is how do you know that Alice and her brothers share both parents?
Is it not correct English to call two people who share one parent, sisters, or brothers?
I guess I could be misguided by my native Norwegian where you have to preamble the word with "hell" (full), or "halv" (half), if you want to specify the number of shared parents.
And here lies the exact issue. Single tests don’t provide any meaningful insights. You need to perform this test at least twenty times in separate chat windows or via the API to obtain meaningful statistics.
For the "Alice in Wonderland" paper, neither Claude-3.5 nor o1-preview was available at that time.
But I have tested them as well a few weeks ago with the issue translated into German, achieving also a 100% success rate with both models.
However, when I add irrelevant information (My mother ...), Claude's success rate drops to 85%:
"My mother has a sister called Alice. Alice has 2 sisters and 1 brother. How many sisters does Alice's brother have?"