You're not paying enough attention to the performance and cost impact of connectors and sockets. CAMM2/LPCAMM and CUDIMM have yet to be demonstrated operating at speeds that speeds that match the fastest soldered LPDDR, let alone GDDR; there's still a clear advantage for soldering memory.
CPU sockets with more than two memory channels are also far more expensive; the higher pin count usually increases the number of layers the motherboard needs, and the larger size of the socket requires more metal for stiffening (and EPYC CPUs still have issues with imperfect mounting leading to some IO lanes not working).
Using BGA soldering for both the processor and the memory sidesteps a bunch of engineering challenges.