One thing I’ve noticed when comparing these models is that “quality” and “realism” don’t always move together.
Some models are very strong at sharp details and localized edits, but they can break global lighting consistency — shadows, reflections, or overall scene illumination drift in subtle ways. GPT-Image seems to trade a bit of micro-detail for better global coherence, especially in lighting, which makes composites feel more believable even if they’re not pixel-perfect.
It’s hard to capture this in benchmarks, but for real-world editing workflows it ends up mattering more than I initially expected.