It's also routinely failing the car wash question across all models now, which wasn't the ...

disillusioned • today at 8:49 AM • 2 replies • view on HN

It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/

Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take, but quoted in human effort, or suggesting the "easier" path forward even if it's a hack or kludge-filled solution.

Replies

andai • today at 11:41 AM

> over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take

I heard a while back Claude refused to attempt a task for days, saying it would take weeks of work. Eventually the user convinced it to try, and it one-shotted it in 30 seconds.

➕ show 1 reply

_blk • today at 9:18 AM

Awesome, I didn't know about the car wash question.

Totally true, also tokens seem to burn through much faster. More parallelism could explain some of it but where I could work on 3-5 projects at once on the max plan a month ago, I can't even get one to completion now on the same Opus model before the 5h session locks me up..

alt Hacker News

Replies