logoalt Hacker News

sometimelurkerlast Sunday at 4:35 PM1 replyview on HN

they use multitoken prediction behind the scenes, that might interact with the CoT in a strange way. maybe for different thinking modes they have different MTP models? if so thats interesting


Replies

pyentropylast Sunday at 4:38 PM

The number of tokens you predict at time (multi or not) has nothing to do with whether the model wants to emit any, some or a lot of reasoning tokens in reasoning tag -- similar to how branch prediction will not really change the for loop iteration count.

show 1 reply