Exolabs claims they can distribute the compute over many machines to use memory in aggregate: https://github.com/exo-explore/exo
Maybe there is enough memory in many machines.
That's the general task but the hard part is having the pile of local machines with nearly a TB of VRAM to distribute it on. You'd need over 30 3090s worth of GPUs to run those models.
That's the general task but the hard part is having the pile of local machines with nearly a TB of VRAM to distribute it on. You'd need over 30 3090s worth of GPUs to run those models.