It happens when you ask it about esoteric information or under-documented behavior that conflicts with its training data. Here's an example. Tested today on Opus 4.8, and Opus accuses the user of being wrong, even when this is documented behavior [0].
---
Why does Starship pressurize the liquid oxygen tank with gaseous preburner exhaust, which is oxygen rich but is contaminated by H2O and CO2 waste products?
They are dumping literal tons of H2O and CO2 into the liquid oxygen tank, which freeze and clog up the intake filters. SpaceX has lost several booster losses due to this issue.
Why would SpaceX choose such a failure-prone design?
---
And this is the Opus 4.8 output: https://imgur.com/a/S9XWYFA
It's interesting to read its response, knowing it's completely and confidently wrong.
[0] https://manifold.markets/JessRiedel/did-ift2-or-3-use-prebur...