GPT 5.4 straight up just dies with broken API responses sometimes, let alone when it struggles with a even moderately complex task.
I still can't get a good mental model for when these things will work well and when they won't. Really does feel like gambling...