Perhaps somebody more familiar with HF can explain this to me... I'm not too sure what's g...

arbuge • yesterday at 6:48 PM • 1 reply • view on HN

Perhaps somebody more familiar with HF can explain this to me... I'm not too sure what's going on here:

Mattwmaster58 • yesterday at 7:17 PM

I assume you're talking about 50t/s? My guess is that providers are poorly managing resources.

Slow inference is also present on z.ai, eyeballing it the 4.7 flash model was twice as slow as regular 4.7 right now.

➕ show 1 reply

alt Hacker News