I don't, fits on my card with the full context, I think the native MXFP4 weights takes ~70GB of...

embedding-shape • today at 10:49 AM • 1 reply • view on HN

I don't, fits on my card with the full context, I think the native MXFP4 weights takes ~70GB of VRAM (out of 96GB available, RTX Pro 6000), so I still have room to spare to run GPT-OSS-20B alongside for smaller tasks too, and Wayland+Gnome :)

Replies

storystarling • today at 12:24 PM

I thought the RTX 6000 Ada was 48GB? If you have 96GB available that implies a dual setup, so you must be relying on tensor parallelism to shard the model weights across the pair.

➕ show 1 reply

alt Hacker News

Replies