I just tested GPT1.5. I would say the image quality is on par with NBP in my tests (which is surprising as the images in their trailer video are bad), but the prompt adherence is way worse, and its "world model" if you want to call it that is worse. For instance, I asked it for two people in a row boat and it had two people, but the boat was more like a coracle and they would barely fit inside it.
Also: SUPER ANNOYING. It seems every time you give it a modification prompt it erases the whole conversation leading up to the new pic? Like.. all the old edits vanish??
I added "shaky amateur badly composed crappy smartphone photo of ____" to the start of my prompts to make them look more natural.
Counterpoint from someone on the Musk site: https://x.com/flowersslop/status/2001007971292332520
I actually just finished running the Text-to-Image benchmark a few minutes ago. This matches my own testing as well. GPT-Image 1.5 is clearly a step up as an editing model, but it performed worse in purely generative tasks compared to its predecessor - dropping from 11 (out of 14) to 9.
Comparing NB Pro, GPT Image 1, and GPT Image 1.5
https://genai-showdown.specr.net/?models=o4,nbp,g15