logoalt Hacker News

BoorishBearsyesterday at 8:46 PM1 replyview on HN

Eh, I'm testing it now and it seems a bit too fast to be the same size, almost 2x the Tokens Per Second and much lower Time To First Token.

There are other valid reasons for why it might be faster, but faster even while everyone's rushing to try it at launch + a cost decrease leaves me inclined to believe it's a smaller model than past Opus models


Replies

kristianpyesterday at 9:04 PM

It could be a combination of over-provisioning for early users, smaller model and more quantisation.