> I wonder what the underlying cause is
It responds with the statistically most probable text based on its training data, which happens to be different with the errors vs without. I suspect high-fidelity diagramming requires a different attention architecture from the common ones used in sentence-optimized models.