The malformed JSON problem is real and tends to be model-specific. We run multiple LLM providers in production (OpenAI, Groq/Llama, Google AI, Ollama) and there's a meaningful gap in schema adherence between frontier models and smaller/cheaper ones. Nested arrays with optional fields are particularly tricky — smaller models will nail 19 out of 20 objects and silently mangle the last one in ways that are hard to predict.
One pattern that's helped us: decomposing complex schemas into multiple simpler sequential extractions rather than one large schema. Less impressive as a demo, but noticeably more reliable in production when you're cost-optimizing with smaller models. The partial recovery approach here (keeping valid items even when one fails) is exactly the right instinct for keeping pipelines alive.