A nvidia spark thingie has 128GB unified RAM. They also have a dual port version of one of these things: https://www.nvidia.com/content/dam/en-zz/Solutions/networkin.... ie 2 x 100GB/s ports, they may even be 2 x 200GB/s. Once I've got my paws on one, I'll know more.
You can cluster these beasts too. Two and three (with two IP subnets) is fairly obvious. Four or more might need a switch depending on how much network latency affects things.
Apple seem to have forgotten about M series with gobs of RAM. I can't get the Apple shop to show more than 96GB of unified RAM and that costs a kidney.
200 Gb / s (not GB/s)!
(Still potentially very useful! But not magically ultra fast.)
I have one, and I love it. That said my buddies Mac smokes it for inference workloads in terms of tokens per second AND its more usable for other things.
If you are training and doing research it's great, if you want to cluster them it cant be beat, but if you just want local inference on a single box buy a mac or even a strix halo device.