That's not evidence that the model is assuming anything, and this is not a brainteaser. A brain...

tsimionescu • today at 10:40 AM • 2 replies • view on HN

That's not evidence that the model is assuming anything, and this is not a brainteaser. A brainteaser would be exactly the opposite, a question about walking or driving somewhere where the answer is that the car is already there, or maybe different car identities (e.g. "my car was already at the car wash, I was asking about driving another car to go there and wash it!").

If the LLM were really basing its answer on a model of the world where the car is already at the car wash, and you asked it about walking or driving there, it would have to answer that there is no option, you have to walk there since you don't have a car at your origin point.

Replies

layer8 • today at 11:24 AM

It might be assuming that more than one car exists in the world.

crazygringo • today at 3:03 PM

You're right it's not a brain teaser, it's an... anti-brain teaser? Also known as an idiotic question.

I guess the question is, how valid are idiotic questions in assessing LLM performance? Does poor performance on them mean poor performance on harder questions? (The answer seems to be no... but I have no actual idea.) Or are they something important to train on and get right, either for overall trustworthiness in viral "gotcha" situations like these, or because sometimes people genuinely ask them?

alt Hacker News

Replies