I’m currently optimizing FLUX to run on a cluster of consumer 8GB VRAM cards (RTX 4060s). I noticed ...

spencer9714 • today at 6:41 PM • 0 replies • view on HN

I’m currently optimizing FLUX to run on a cluster of consumer 8GB VRAM cards (RTX 4060s). I noticed Lemonade emphasizes NPU and GPU orchestration. Have you found that offloading the 'aesthetic scoring' or 'text encoding' to the NPU significantly frees up VRAM for the main diffusion process, or is the overhead of moving tensors back and forth too high on consumer hardware?

alt Hacker News