logoalt Hacker News

iamtheworstdevtoday at 4:57 PM1 replyview on HN

are you running an NVLink? I have the same setup but no NVLink and it feels like it's best just splitting the 3090s to run separate models concurrently. But I also have no idea what I'm doing.


Replies

fluoridationtoday at 7:40 PM

It depends on what you're comparing. If the same model fits on the combined VRAM but not on a single contiguous VRAM, then it won't be faster to run two instances of it. If you're comparing a 23 GB model running duplicated vs a 46 GB model running split, then yeah, that will likely be faster, just because there's no synchronization between cards.

AFAIUI, there'd be little advantage in having a higher speed inter-card connection, because the cards don't really talk to each other during inference. The loss of efficiency compared to a monolithic memory architecture comes from scheduling, not from data transfer.