"GLM 5.2 is "almost Opus," and it needs at least 8xH200s for comfortable inference ..."
What is the behavior if one were to run GLM 5.2 with only a single H200 ?
Would it fail to run at all, or would it just run so slowly as to be unusable ?
I would like to prove out the build, and concept, of a SOTA model locally, but then backfill the rest of the GPUs in 18-24 months when they cost significantly less ...
> in 18-24 months when they cost significantly less ...
going to need you to sit down for this one...