This is directional; models self-report confidence on their answers and the strength is a linear combination of the confidence plus a bonus for every model that got clustered in.
Models are notoriously uncalibrated especially for self-reporting confidence so I would treat it lightly. Hopefully I can study this a bit later on!