> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend...

hintymad • today at 5:52 PM • 3 replies • view on HN

> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.

I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Replies

woadwarrior01 • today at 6:13 PM

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

➕ show 1 reply

randall • today at 6:43 PM

[dead]

alt Hacker News

Replies