logoalt Hacker News

incrudibletoday at 9:21 AM1 replyview on HN

You need to train independently and merge rarely. The problem is the merge step. Weights are too entangled, you are not going to get an improvement commensurate to the effort. Otherwise, everyone would do it. It is an open research problem.


Replies

filuptoday at 10:33 AM

That sounds like the way. Everyone trains their own small problems to maximally compressed weights and then merges.