no it might. a high reasoning task is probably harder than a low reasoning task, so the same MTP LLM...

sometimelurker • last Monday at 1:16 AM • 0 replies • view on HN

no it might. a high reasoning task is probably harder than a low reasoning task, so the same MTP LLM will predict more correct tokens on the low reasoning task. to compensate for this, big labs likely have different MTP LLMs for different cases. it would make sense for them to do this

alt Hacker News