logoalt Hacker News

regularfrytoday at 1:18 PM1 replyview on HN

> The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.

As far as I can see that's not implied by the original post.

But that's beside the point: quoting the bit where the poster says "here's what I'm building on top of" and using that to imply they haven't done anything new is a bit pointless, no?


Replies

simgttoday at 1:43 PM

You're right that my quote was misleading, I overlooked "the weird part" in the post because it didn't seem new to me either.

Here's the section in the original post that covers it: https://dnhkng.github.io/posts/rys/#the-brain-scanner All heatmaps are split by tasks and show an optimal point for each. The resulting routing he chose is a trade-off for both tasks, there isn't much else to do unless you intend to train a router anyway.

> So the ‘math organ’ has boundaries on both sides. Too few layers and you get nothing — you’ve cut into the circuit and it can’t complete its operation. Too many layers and you also get nothing — you’ve included tissue from a neighbouring circuit that doesn’t belong. Pre-training carved these structures out of the layer stack, and they only work whole. It also doesn’t translate to other tasks, as the heatmap for EQ scores doesn’t have this patch.