Faster tokens = more reasoning loops, so it can actually make the models smarter as well.

paxys • yesterday at 6:41 PM • 1 reply • view on HN

Replies

Yeah! So at a much smaller scale, being able to boost Step 3.7 Flash up to 40tk/s on my Spark-alike with proper triple head MTP was the thing that made it superior to Qwen 3.6 27B in wall clock time despite Step reasoning more

A lot of the open Chinese models get their results through huge reasoning loops. Being able to boost decode perf is what will make them worth it, and I’m sure OpenAI and Anthropic could do similar (if they aren’t already)

alt Hacker News

Replies