logoalt Hacker News

hintymadtoday at 5:52 PM3 repliesview on HN

> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.

I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.


Replies

woadwarrior01today at 6:13 PM

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

show 1 reply
randalltoday at 6:43 PM

[dead]