The pace of commoditization in image generation is wild. Every 3-4 months the SOTA shifts, and last quarter's breakthrough becomes a commodity API.
What's interesting is that the bottleneck is no longer the model — it's the person directing it. Knowing what to ask for and recognizing when the output is good enough matters more than which model you use. Same pattern we're seeing in code generation.
SOTA shifts, yes. But the average person doing the work has been very happy with SDXL based models. And that was released two years ago.
The fight right now outside of API SOTA is who will replace SDXL to be the “community preference”
It’s now a three way between Flux2 Klein, Z-Image, and now Qwen2.
I'm happy the models are becoming commodity, but we still have a long way to go.
I want the ability to lean into any image and tweak it like clay.
I've been building open source software to orchestrate the frontier editing models (skip to halfway down), but it would be nice if the models were built around the software manipulation workflows:
https://getartcraft.com/news/world-models-for-film