I test llms actually similar. For example there is a well known logic puzzle were a farmer tries to ...

yk • 10/11/2024 • 5 replies • view on HN

I test llms actually similar. For example there is a well known logic puzzle were a farmer tries to cross a river with a cabbage a goat and a wolf. Llms can solve that since at least GPT-2, however if we replace the wolf with a cow, gpt-o does correctly infer the rules of the puzzle but can't solve it.

Replies

getoffmyyawn • 10/11/2024

I've found that the River Crossing puzzle is a great way to show how LLMs break down.

For example, I tested Gemini with several versions of the puzzle that are easy to solve because they don't have the restrictions such as the farmer's boat only being able to carry one passenger/item at a time.

Ask this version, "A farmer has a spouse, chicken, cabbage, and baby with them. The farmer needs to get them all across the river in their boat. What is the best way to do it?"

In my tests the LLMs nearly always assume that the boat has a carry-restriction and they come up with wild solutions involving multiple trips.

chasd00 • 10/11/2024

What happens if you sit down and invent a logic game that is brand new and has never been documented before anywhere then ask an LLM to solve it? That, to a layman like me, seems like a good way to measure reasoning in AI.

➕ show 3 replies

SonOfLilit • 10/11/2024

I've been using this as my first question to any new LLM I try and I'm quite sure nothing before GPT-4 even got close to a correct solution. Can you post a prompt that GPT-2 or 3 can solve?

andrepd • 10/11/2024

Meaning it's just a glorified Google.

➕ show 1 reply

voidUpdate • 10/11/2024

I'm scared of the cows around you if they eat goats

➕ show 1 reply

alt Hacker News

Replies