logoalt Hacker News

tarrudayesterday at 10:12 PM1 replyview on HN

> It's fast (~3 seconds on my RTX 4090)

It is amazing how far behind Apple Silicon is when it comes to use non- language models.

Using the reference code from Z-image on my M1 ultra, it takes 8 seconds per step. Over a minute for the default of 9 steps.


Replies

p-e-wyesterday at 11:57 PM

The diffusion process is usually compute-bound, while transformer inference is memory-bound.

Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.

show 1 reply