logoalt Hacker News

windexh8eryesterday at 10:11 PM1 replyview on HN

I think you missed the point and don't understand / aren't considerate of SLM utility.


Replies

Kirby64yesterday at 10:17 PM

But I’m not missing the point. If you can run one frontier model at 750t/s, then you can probably run many many instances of an SLM in parallel at a rate that exceeds 15k/s. That’s kinda the point of the flash or ultrafast variants. And they’re on something much more modern than llama3.1.

show 1 reply