logoalt Hacker News

p-e-wyesterday at 11:57 PM1 replyview on HN

The diffusion process is usually compute-bound, while transformer inference is memory-bound.

Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.


Replies

tarrudatoday at 12:47 AM

> but it’s light years behind on compute.

Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.