logoalt Hacker News

storystarlingtoday at 12:24 PM1 replyview on HN

I thought the RTX 6000 Ada was 48GB? If you have 96GB available that implies a dual setup, so you must be relying on tensor parallelism to shard the model weights across the pair.


Replies

embedding-shapetoday at 12:40 PM

RTX Pro 6000 - 96GB VRAM - Single card