1t model instances(opus, gpt,etc) are not running on a single GPU. The catch is how the cards communicate and how the model is broken up. There's a bit that goes into it but the answer is yes the more gpus the bigger the model you can run.
Really cool. I'm very much still learning about this stuff. Sounds like this inter-GPU communication is a feature of special hardware (not consumer GPUs).
Really cool. I'm very much still learning about this stuff. Sounds like this inter-GPU communication is a feature of special hardware (not consumer GPUs).