I assume you're talking about 50t/s? My guess is that providers are poorly managing resour...

Mattwmaster58 • yesterday at 7:17 PM • 1 reply • view on HN

I assume you're talking about 50t/s? My guess is that providers are poorly managing resources.

Slow inference is also present on z.ai, eyeballing it the 4.7 flash model was twice as slow as regular 4.7 right now.

Replies

None of it makes much sense. The model labelled as fastest has much higher latency. The one labelled as cheapest costs something, whereas the other one appears to be free (price is blank). Context on that one is blank and also unclear.

alt Hacker News

Replies