Really why have the PCI/CPU artifice at all? Apple and Nvidia have the right idea: put the MPP on the same die/package as the CPU.
> put the MPP on the same die/package as the CPU.
That would help in latency-constrained workloads, but I don't think it would make much of a difference for AI or most HPC applications.
We need low power but high PCIE lane count CPUs for that. Just purely for shoving models from NVMe to GPU