logoalt Hacker News

Karumatoday at 1:52 AM1 replyview on HN

Wow, every single word in the original post and on that README.md is pure LLM. How sad.

In any case, this has been done at least since the very first public releases of Llama by Meta... It also works for image models. There are even a few ComfyUI nodes that let you pick layers to duplicate on the fly, so you can test as many as you want really quickly.


Replies

xlayntoday at 2:22 AM

Fair point on the writing style, I used Claude extensively on this project, including drafting. The experiments and ideas are mine though.

On the prior art: you're right that layer duplication has been explored before. What I think is new here is the systematic sweep toolkit + validation on standard benchmarks (lm-eval BBH, GSM8K, MBPP) showing exactly which 3 layers matter for which model. The Devstral logical deduction result (0.22→0.76) was a surprise to me.

If there are ComfyUI nodes that do this for image models, I'd love links, the "cognitive modes" finding (different duplication patterns that leads to different capability profiles from the same weights) might be even more interesting for diffusion models.

show 1 reply