logoalt Hacker News

ipsodyesterday at 4:46 PM2 repliesview on HN

How fast is it?


Replies

wolttamyesterday at 4:52 PM

2000 t/s prompt processing and 40-50 t/s generation. We should see 60-70 t/s generation with DSpark support solidifying in vLLM in a few days

Recent discussion on DSpark: https://news.ycombinator.com/item?id=48696585