logoalt Hacker News

girvolast Saturday at 10:50 PM1 replyview on HN

Being grafted onto the main model reduces layer duplication that you’d otherwise have: at least for Step and Qwen 3.6


Replies

alfiedotwtfyesterday at 11:18 AM

Step 2.7’s MTP seems broken (at least for ik_llama.cpp) where the draft model starts and ends in block 3 but ik_llama bails out looking for block 0 :(

show 1 reply