logoalt Hacker News

paxysyesterday at 6:41 PM1 replyview on HN

Faster tokens = more reasoning loops, so it can actually make the models smarter as well.


Replies

girvoyesterday at 9:31 PM

Yeah! So at a much smaller scale, being able to boost Step 3.7 Flash up to 40tk/s on my Spark-alike with proper triple head MTP was the thing that made it superior to Qwen 3.6 27B in wall clock time despite Step reasoning more

A lot of the open Chinese models get their results through huge reasoning loops. Being able to boost decode perf is what will make them worth it, and I’m sure OpenAI and Anthropic could do similar (if they aren’t already)