The diffusion process is usually compute-bound, while transformer inference is memory-bound. Apple...

p-e-w • yesterday at 11:57 PM • 1 reply • view on HN

The diffusion process is usually compute-bound, while transformer inference is memory-bound.

Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.

tarruda • today at 12:47 AM

> but it’s light years behind on compute.

Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.

alt Hacker News