For LLM inference, I don't think the PCIe bandwidth matters much and a GPU could improve greatly the prompt processing speed.
Only if your entire model fits the GPU VRAM.
To me this reads like "if you can afford those 256GB VRAM GPUs, you don't need PCIe bandwidth!"
Yeah, I think so. Once the whole model is on the GPU (potentially slower start-up), there really isn't much traffic between the GPU and the motherboard. That's how I think about it. But mostly saying this as I'm interested in being corrected if I'm wrong.
The Strix Halo iGPU is quite special, like the Apple iGPU it has such good memory bandwidth to system RAM that it manages to improve both prompt processing and token generation compared to pure CPU inference. You really can't say that about the average iGPU or low-end dGPU: usually their memory bandwidth is way too anemic, hence the CPU wins when it comes to emitting tokens.