10 tok/s is quite fine for chatting, though less so for interaction with agentic workloads. So ...

zozbot234 • yesterday at 10:39 PM • 0 replies • view on HN

10 tok/s is quite fine for chatting, though less so for interaction with agentic workloads. So the technique itself is still worthwhile for running a huge model locally.

alt Hacker News