Perhaps somebody more familiar with HF can explain this to me... I'm not too sure what's going on here:
https://huggingface.co/inference/models?model=zai-org%2FGLM-...
I assume you're talking about 50t/s? My guess is that providers are poorly managing resources.
Slow inference is also present on z.ai, eyeballing it the 4.7 flash model was twice as slow as regular 4.7 right now.
I assume you're talking about 50t/s? My guess is that providers are poorly managing resources.
Slow inference is also present on z.ai, eyeballing it the 4.7 flash model was twice as slow as regular 4.7 right now.