I am amazed, though not entirely surprised, that these models keep getting smaller while the quality and effectiveness increases. z image turbo is wild, I'm looking forward to trying this one out.
An older thread on this has a lot of comments: https://news.ycombinator.com/item?id=46046916
I haven’t gotten around to adding Klein to my GenAI Showdown site yet, but if it’s anything like Z-Image Turbo, it should perform extremely well.
For reference, Z-Image Turbo scored 4 out of 15 points on GenAI Showdown. I’m aware that doesn’t sound like much, but given that one of the largest models, Flux.2 (32b), only managed to outscore ZiT (a 6b model) by a single point and is significantly heavier-weight, that’s still damn impressive.
Local model comparisons only:
> FLUX.2 [klein] 4B The fastest variant in the Klein family. Built for interactive applications, real-time previews, and latency-critical production use cases.
I wonder what kind of use cases could be "latency-critical production use cases"?
It cannot create an image of a pogo stick.
I was trying to get it to create an image of a tiger jumping on a pogo stick, which is way beyond its capabilities, but it cannot create an image of a pogo stick in isolation.
If we think of GenAI models as a compression implementation. Generally, text compresses extremely well. Images and video do not. Yet state-of-the-art text-to-image and text-to-video models are often much smaller (in parameter count) than large language models like Llama-3. Maybe vision models are small because we’re not actually compressing very much of the visual world. The training data covers a narrow, human-biased manifold of common scenes, objects, and styles. The combinatorial space of visual reality remains largely unexplored. I am looking towards what else is out there outside of the human-biased manifold.
I appreciate that they released a smaller version that is actually open source. It creates a lot more opportunities when you do not need a massive budget just to run the software. The speed improvements look pretty significant as well.
2026 will be the year of small/open models
damn, they really counter attack after z-image release huh
good competition breed innovation
Flux2 Klein isn’t some generation leap or anything. It’s good, but let’s be honest, this is an ad.
What will be really interesting to me is the release of Z-image, if that goes the way it’s looking, it’ll be natural language SDXL 2.0, which seems to be what people really want.
Releasing the Turbo/Distilled/Finetune months ago was a genius move really. It hurt Flux and Qwen releases on a possible future implication alone.
If this was intentional, I can’t think of the last time I saw such shrewd marketing.
Neat, I really enjoyed flux 1. Currently use z image turbo for messing around.
I will wait for invoke to add flux2 klein.