Interesting. It seems to me that with that speed (20-30) on local hardware the real issue is quality...

bjelkeman-again • yesterday at 5:44 PM • 1 reply • view on HN

Interesting. It seems to me that with that speed (20-30) on local hardware the real issue is quality of output, not tokens per sec.

Replies

NitpickLawyer • yesterday at 5:53 PM

It really depends. With the new "thinking" models they usually spend some time before writing the final answer. If they "think" for 1k tokens, that's a minute of spinning wheel you're gonna see for each question. Add that to the prompt processing, and diminishing speeds as context increases, and it becomes really slow for longer sessions.

➕ show 1 reply

alt Hacker News

Replies