Instead of markdown -> LLM to get JSON, you can just train a slightly bigger model which you can constrain decode to give JSON rightaway. https://huggingface.co/nanonets/Nanonets-OCR2-3B
We recently published a cookbook for constrained decoding here: https://nanonets.com/cookbooks/structured-llm-outputs/