Linear architectures are at least heavily used in image diffusion models. More so in fact than in language models.