logoalt Hacker News

bjelkeman-againyesterday at 5:44 PM1 replyview on HN

Interesting. It seems to me that with that speed (20-30) on local hardware the real issue is quality of output, not tokens per sec.


Replies

NitpickLawyeryesterday at 5:53 PM

It really depends. With the new "thinking" models they usually spend some time before writing the final answer. If they "think" for 1k tokens, that's a minute of spinning wheel you're gonna see for each question. Add that to the prompt processing, and diminishing speeds as context increases, and it becomes really slow for longer sessions.

show 1 reply