The reason for the "No explanations, no qualifiers" in the prompt was to force the models to put the claim in one of the four buckets and answer with the bucket name only. It's a pure quantitive analysis (first in a series) and it does indeed lack the qualitative aspect.
Sure, but people are drawing conclusions beyond "LLMs said different words" and trying to use it to analyze whether LLMs were wrong about the underlying facts, but that information isn't available to us.
structured output { "answer" : "Misleading", "reason" : "Almonds..." }
Have reason be optional and instruct it to only provide reason for the middle "Mostly True" or "Misleading".