logoalt Hacker News

vb-8448yesterday at 10:07 PM2 repliesview on HN

Actually even with a 9k hardware you won't get good enough performance. There is an interesting video from antirez on trying to run deepseek v4 flash 2bits on a m3 max 128GB ... and the result is kind delusional: as soon as the context start growing you are around 20token/s.


Replies

zozbot234yesterday at 10:15 PM

Prefill performance used to be the real bottleneck on antirez's DS4 and that's been greatly improved by now, it doesn't perceivably slow down with growing context.