presumably that happens at training time?
then once successfully trained you get faster inference from just the diffusion model