logoalt Hacker News

NitpickLawyertoday at 4:04 PM1 replyview on HN

A 3rd alternative is to use the best of both worlds. Have the model respond in free-form. Then use that response + structured output APIs to ask it for json. More expensive, but better overall results. (and you can cross-check between your heuristic parsing vs. the structured output, and retry / alert on miss-matches)


Replies

theolitoday at 6:18 PM

I am doing this with good success parsing receipts with ministral3:14b. The first prompt describes the data being sought, and asks for it to be put at the end of the response. The format tends to vary between json, bulleted lists, and name: value pairs. I was never able to find a good way to get just JSON.

The second pass is configured for structured output via guided decoding, and is asked to just put the field values from the analyzer's response into JSON fitting a specified schema.

I have processed several hundred receipts this way with very high accuracy; 99.7% of extracted fields are correct. Unfortunately it still needs human review because I can't seem to get a VLM to see the errors in the very few examples that have errors. But this setup does save a lot of time.