Z-Image: Powerful and highly efficient image generation model with 6B parameters

200 points • by doener • last Sunday at 11:36 AM • 68 comments • view on HN

Comments

vunderba • today at 5:36 PM

I've done some preliminary testing with Z-Image Turbo in the past week.

Thoughts

- It's fast (~3 seconds on my RTX 4090)

- Surprisingly capable of maintaining image integrity even at high resolutions (1536x1024, sometimes 2048x2048)

- The adherence is impressive for a 6B parameter model

Some tests (2 / 4 passed):

https://imgpb.com/exMoQ

Personally I find it works better as a refiner model downstream of Qwen-Image 20b which has significantly better prompt understanding but has an unnatural "smoothness" to its generated images.

➕ show 5 replies

muglug • today at 5:59 PM

The [demo PDF](https://github.com/Tongyi-MAI/Z-Image/blob/main/assets/Z-Ima...) has ~50 photos of attractive young women sitting/standing alone, and exactly two photos featuring young attractive men on their own.

It's incredibly clear who the devs assume the target market is.

➕ show 8 replies

danielbln • today at 5:33 PM

We've come a long way with these image models, and the things you can do with paltry 6B are super impressive. The community has adopted this model wholesale, and left Flux(2) by the way side. It helps that Z-Image isn't censored, whereas BFL (makers of Flux 2) dedicated like a fith of their press release talking about how "safe" (read: censored and lobotomized) their model is.

➕ show 2 replies

reactordev • today at 10:02 PM

My issue with this model is it keeps producing Chinese people and Chinese text. I have to very specifically go out of my way to say what kind of race they are.

If I say “A man”, it’s fine. A black man, no problem. It’s when I add context and instructions is just seems to want to go with some Chinese man. Which is fine, but I would like to see more variety of people it’s trained on to create more diverse images. For non-people it’s amazingly good.

➕ show 1 reply

xnx • today at 5:34 PM

Z-Image seems to be the first successor to Stable Diffusion 1.5 that delivers better quality, capability, and extensibility across the board in an open model that can feasibly run locally. Excitement is high and an ecosystem is forming fast.

khimaros • today at 6:08 PM

i have been testing this on my Framework Desktop. ComfyUI generally causes an amdgpu kernel fault after about 40 steps (across multiple prompts), so i spent a few hours building a workaround here https://github.com/comfyanonymous/ComfyUI/pull/11143

overall it's fun and impressive. decent results using LoRA. you can achieve good looking results with as few as 8 inference steps, which takes 15-20 seconds on a Strix Halo. i also created a llama.cpp inherence custom node for prompt enhancement which has been helping with overall output quality.

nine_k • today at 5:40 PM

It's amazing how much knowledge about the world fits into 16 GiB of the distilled model.

➕ show 2 replies

xfalcox • today at 6:20 PM

We have vLLM for running text LLMs in production. What is the equivalent for this model?

➕ show 1 reply

thih9 • today at 8:04 PM

As an AI outsider with a recent 24GB macbook, can I follow the quick start[1] steps from the repo and expect decent results? How much time would it take to generate a single medium quality image?

[1]: https://github.com/Tongyi-MAI/Z-Image?tab=readme-ov-file#-qu...

➕ show 2 replies

zkmon • today at 5:38 PM

Just want to learn - who actually needs or buys up generated images?

➕ show 4 replies

bilsbie • today at 9:44 PM

What kind of rig is required to run this?

➕ show 1 reply

Copenjin • today at 5:08 PM

Very good, not always perfect with text or with following exactly the prompt, but 6B so... impressive.

pawelduda • today at 5:10 PM

Did anyone test it on 5090? I saw some 30xx reports and it seemed very fast

➕ show 2 replies

cubefox • today at 9:26 PM

I'm particularly impressed by the fact that they seem to aim for photorealism rather than the semi-realistic AI-look that is common in many text-to-image models.

➕ show 1 reply

BoredPositron • today at 7:51 PM

I wish they would have used the WAN vae.

idontwantthis • today at 6:43 PM

Does it run on apple silicon?

➕ show 2 replies

alt Hacker News

Z-Image: Powerful and highly efficient image generation model with 6B parameters

Comments