How exactly are we asking for the confidence level?
If you give the model the image and a prior prediction, what can it tell you? Asking for it to produce a 1-10 figure in the same token stream as the actual task seems like a flawed strategy.