Interesting! .TXT has the opposite conclusion, that structured output improves performance:
https://blog.dottxt.ai/say-what-you-mean.html
https://blog.dottxt.ai/prompt-efficiency.html
This also matches my own experiences.
Yup. I instantly linked these because the multiple papers who claim structured outputs harm quality are not just wrong, but fatally damaging to the whole AI ecosystem especially AI agents.
There are places where structured outputs harms creativity, but usually that's a decoding time problem which is similarly solved with better sampling, like they talk about in this paper: https://arxiv.org/abs/2410.01103
Claims of harmed reasoning performance are really evidence that 1. Your structured generation backend is bad or 2. Some shenanigans/interactions with temperature/samplers (this is the most common by far) or 3. You are bad at benchmarking.
Same for me. Using structured output was much better than without.