logoalt Hacker News

taykolasinskiyesterday at 4:16 PM1 replyview on HN

This is a fantastic catch. I hadn't realized Gemma 3n was already shipping with a variant of this in production.

It feels like we are entering the era of residual stream engineering. For a long time, the standard x + F(x) additive backbone was treated as untouchable. Now, between mHC (weighted scaling) and LAuReL (low-rank projections), labs are finally finding stable ways to make that signal path more dynamic.

I'm curious if the Low-Rank constraint in LAuReL acts as a natural stabilizer against the gradient explosion I saw with unconstrained hyper-connections.

Thanks for the paper link, definitely reading that tonight.


Replies

cpldcpuyesterday at 5:46 PM

Thanks! Would be quite interesting to see how this fares compared to mHC.

I noted that LAuReL is cited in the mHC paper, but they refer to it as "expanding the width of the residual stream", which is rather odd.