logoalt Hacker News

throwatdem12311today at 12:05 PM3 repliesview on HN

I spent more than half my day yesterday telling Claude to correct itself because it did things I explicitly told it not to do in my prompt.

“You’re right - I overstepped”

Is the new “You’re absolutely right”.

I don’t know if we can qualify something that actively goes against the explicit instructions you give it as “something great”. It just sounds like Dario is building snake oil and selling it too.


Replies

malfisttoday at 12:22 PM

I have a script at work that writes out some config files and I'm having Claude run them after making changes. The script if it detects breaking changes will spit out a message saying what the breaking changes are, and not do anything, telling you to rerun it after validation with the override flag.

If I don't tell Claude about this behavior, it ignores the script output and lies about passing tests that validate if the config files were regenerated.

So I added to my prompt instructions to observe it, and if it sees that message, double check its work and then inform me and ask what to do before proceeding.

This has had the net result of Claude either running the script with the override flag from the get go (explicitly forbidden) or it seeing the message and convincing itself that the override is warranted and running it a second time with the override flag. It's never once stopped to ask me what to do like instructed.

sandostoday at 12:17 PM

This is one of a few reason I strongly prefer GPT and its codex variants. It seldom frustrates me, sure its not omnipotent in any way, but it just feels very "tuned in" when it comes to understanding intent and scope.

PunchyHamstertoday at 12:13 PM

Imagine worker that did loop of "you're absolutely right -> same fuckup again" multiple days every week, wasting time of whoever told them to do the task

They'd be out of company after a week

show 5 replies