All of the latest models I've tried actually pass this test. What I found interesting was all o...

jaccola • today at 7:05 AM • 13 replies • view on HN

All of the latest models I've tried actually pass this test. What I found interesting was all of the success cases were similar to:

e.g. "Drive. Most car washes require the car to be present to wash,..."

Only most?!

They have an inability to have a strong "opinion" probably because their post training, and maybe the internet in general, prefer hedged answers....

Replies

Waterluvian • today at 7:08 AM

Here’s my take: boldness requires the risk of being wrong sometimes. If we decide being wrong is very bad (which I think we generally have agreed is the case for AIs) then we are discouraging strong opinions. We can’t have it both ways.

➕ show 1 reply

hansmayer • today at 7:31 AM

> They have an inability to have a strong "opinion" probably

What opinion? It's evaluation function simply returned the word "Most" as being the most likely first word in similar sentences it was trained on. It's a perfect example showing how dangerous this tech could be in a scenario where the prompter is less competent in the domain they are looking an answer for. Let's not do the work of filling in the gaps for the snake oil salesmen of the "AI" industry by trying to explain its inherent weaknesses.

➕ show 3 replies

andersmurphy • today at 7:17 AM

Did you try several times per model? In my experience it's luck of the draw. All the models I tried managed to get it wrong at least once.

The models that had access to search got ot right.But, then were just dealing with an indirect version of Google.

(And they got it right for the wrong reasons... I.e this is a known question designed to confuse LLMs)

jl6 • today at 7:13 AM

I guess it didn’t want to rule out the existence of ultra-powerful water jets that can wash a car in sniper mode.

madeofpalk • today at 8:41 AM

I enjoyed the Deepseek response that said “If you walk there, you'll have to walk back anyway to drive the car to the wash.”

There’s a level of earnestness here that tickles my brain.

deevus • today at 7:40 AM

I tried with Opus 4.6 Extended and it failed. LLMs are non deterministic so I'm guessing if I try a couple of times it might succeed.

nozzlegear • today at 7:30 AM

Opus 4.6 answered with "Drive." Opus 4.6 in incognito mode (or whatever they call it) answered with "Walk."

yanis_t • today at 8:01 AM

> Most car washes... I read it as slight-sarcasm answer

sneak • today at 9:54 AM

There are car wash services that will come to where your car is and wash it. It’s not wrong!

Puts • today at 7:20 AM

> Only most?!

What if AI developed sarcasm without us knowing… xD

➕ show 1 reply

dyauspitr • today at 7:17 AM

There are mobile car washes that come to your house.

➕ show 2 replies

alt Hacker News

Replies