logoalt Hacker News

overfeedtoday at 6:34 PM0 repliesview on HN

> I wonder what the underlying cause is

It responds with the statistically most probable text based on its training data, which happens to be different with the errors vs without. I suspect high-fidelity diagramming requires a different attention architecture from the common ones used in sentence-optimized models.