Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.
It seems to be 160GB at mixed FP4+FP8 precision, FYI. Full FP8 is 250GB+. (B)F16 at around double I would assume.
It seems to be 160GB at mixed FP4+FP8 precision, FYI. Full FP8 is 250GB+. (B)F16 at around double I would assume.