logoalt Hacker News

Z-Image: Powerful and highly efficient image generation model with 6B parameters

200 pointsby doenerlast Sunday at 11:36 AM68 commentsview on HN

Comments

vunderbatoday at 5:36 PM

I've done some preliminary testing with Z-Image Turbo in the past week.

Thoughts

- It's fast (~3 seconds on my RTX 4090)

- Surprisingly capable of maintaining image integrity even at high resolutions (1536x1024, sometimes 2048x2048)

- The adherence is impressive for a 6B parameter model

Some tests (2 / 4 passed):

https://imgpb.com/exMoQ

Personally I find it works better as a refiner model downstream of Qwen-Image 20b which has significantly better prompt understanding but has an unnatural "smoothness" to its generated images.

show 5 replies
muglugtoday at 5:59 PM

The [demo PDF](https://github.com/Tongyi-MAI/Z-Image/blob/main/assets/Z-Ima...) has ~50 photos of attractive young women sitting/standing alone, and exactly two photos featuring young attractive men on their own.

It's incredibly clear who the devs assume the target market is.

show 8 replies
danielblntoday at 5:33 PM

We've come a long way with these image models, and the things you can do with paltry 6B are super impressive. The community has adopted this model wholesale, and left Flux(2) by the way side. It helps that Z-Image isn't censored, whereas BFL (makers of Flux 2) dedicated like a fith of their press release talking about how "safe" (read: censored and lobotomized) their model is.

show 2 replies
reactordevtoday at 10:02 PM

My issue with this model is it keeps producing Chinese people and Chinese text. I have to very specifically go out of my way to say what kind of race they are.

If I say “A man”, it’s fine. A black man, no problem. It’s when I add context and instructions is just seems to want to go with some Chinese man. Which is fine, but I would like to see more variety of people it’s trained on to create more diverse images. For non-people it’s amazingly good.

show 1 reply
xnxtoday at 5:34 PM

Z-Image seems to be the first successor to Stable Diffusion 1.5 that delivers better quality, capability, and extensibility across the board in an open model that can feasibly run locally. Excitement is high and an ecosystem is forming fast.

khimarostoday at 6:08 PM

i have been testing this on my Framework Desktop. ComfyUI generally causes an amdgpu kernel fault after about 40 steps (across multiple prompts), so i spent a few hours building a workaround here https://github.com/comfyanonymous/ComfyUI/pull/11143

overall it's fun and impressive. decent results using LoRA. you can achieve good looking results with as few as 8 inference steps, which takes 15-20 seconds on a Strix Halo. i also created a llama.cpp inherence custom node for prompt enhancement which has been helping with overall output quality.

nine_ktoday at 5:40 PM

It's amazing how much knowledge about the world fits into 16 GiB of the distilled model.

show 2 replies
xfalcoxtoday at 6:20 PM

We have vLLM for running text LLMs in production. What is the equivalent for this model?

show 1 reply
thih9today at 8:04 PM

As an AI outsider with a recent 24GB macbook, can I follow the quick start[1] steps from the repo and expect decent results? How much time would it take to generate a single medium quality image?

[1]: https://github.com/Tongyi-MAI/Z-Image?tab=readme-ov-file#-qu...

show 2 replies
zkmontoday at 5:38 PM

Just want to learn - who actually needs or buys up generated images?

show 4 replies
bilsbietoday at 9:44 PM

What kind of rig is required to run this?

show 1 reply
Copenjintoday at 5:08 PM

Very good, not always perfect with text or with following exactly the prompt, but 6B so... impressive.

paweldudatoday at 5:10 PM

Did anyone test it on 5090? I saw some 30xx reports and it seemed very fast

show 2 replies
cubefoxtoday at 9:26 PM

I'm particularly impressed by the fact that they seem to aim for photorealism rather than the semi-realistic AI-look that is common in many text-to-image models.

show 1 reply
BoredPositrontoday at 7:51 PM

I wish they would have used the WAN vae.

idontwantthistoday at 6:43 PM

Does it run on apple silicon?

show 2 replies