Opus 4.7 was already trying hard to appear honest. Most conversations I have with it about advice or focusing an opinion often include "my honest take" or "my honest opinion".
The problem is that once I asked it "I'm thinking about A or B" twice, once with "I like A more but suspect B would be best" and a second time with them reversed. Not surprisingly, both times it chose the one I said I suspected was best as it's honest opinion.
I wish I knew how to make it regressively verify its assumptions, like a kind of hook but firing before a sentence is written, or perhaps after and then corrected. I feel like it assuming things clearly wrong is its biggest weakness.