At some point won't the bandwidth requirements exceed the number of pins you can fit within the available package area? Presumably you'll end up back at a low maximum memory high bandwidth GPU design.
I wonder how many of these you could cram into 1U? And what the maximum next gen kW/U figure looks like.