It's gotten more and more shippable, especially with the latest generation (Codex 5.1, Sonnet 4.5, now Opus 4.5). My metric is "wtfs per line", and it's been decreasing rapidly.
My current preference is Codex 5.1 (Sonnet 4.5 as a close second, though it got really dumb today for "some reason"). It's been good to the point where I shipped multiple projects with it without a problem (with eg https://pine.town being one I made without me writing any code).
Have you tried Gemini 3 yet? I haven't done any coding with it, but on other tasks I've been impressed compared to gpt 5 and Sonnet 4.5.