logoalt Hacker News

arbugeyesterday at 6:48 PM1 replyview on HN

Perhaps somebody more familiar with HF can explain this to me... I'm not too sure what's going on here:

https://huggingface.co/inference/models?model=zai-org%2FGLM-...


Replies

Mattwmaster58yesterday at 7:17 PM

I assume you're talking about 50t/s? My guess is that providers are poorly managing resources.

Slow inference is also present on z.ai, eyeballing it the 4.7 flash model was twice as slow as regular 4.7 right now.

show 1 reply