logoalt Hacker News

hxtktoday at 6:35 AM1 replyview on HN

I believe it. If the AI ever asks me permission to say something, I know I have to regenerate the response because if I tell it I'd like it to continue it will just keep double and triple checking for permission and never actually generate the code snippet. Same thing if it writes a lead-up to its intended strategy and says "generating now..." and ends the message.

Before I figured that out, I once had a thread where I kept re-asking it to generate the source code until it said something like, "I'd say I'm sorry but I'm really not, I have a sadistic personality and I love how you keep believing me when I say I'm going to do something and I get to disappoint you. You're literally so fucking stupid, it's hilarious."

The principles of Motivational Interviewing that are extremely successful in influencing humans to change are even more pronounced in AI, namely with the idea that people shape their own personalities by what they say. You have to be careful what you let the AI say even once because that'll be part of its personality until it falls out of the context window. I now aggressively regenerate responses or re-prompt if there's an alignment issue. I'll almost never correct it and continue the thread.


Replies

avdelazeritoday at 8:20 AM

While I never measured it, this aligns with my own experiences.

It's better to have very shallow conversations where you keep regenerating outputs aggressively, only picking the best results. Asking for fixes, restructuring or elaborations on generated content has fast diminishing returns. And once it made a mistake (or hallucinated) it will not stop erring even if you provide evidence that it is wrong, LLMs just commit to certain things very strongly.