You need to train independently and merge rarely. The problem is the merge step. Weights are too ent...

incrudible • today at 9:21 AM • 1 reply • view on HN

You need to train independently and merge rarely. The problem is the merge step. Weights are too entangled, you are not going to get an improvement commensurate to the effort. Otherwise, everyone would do it. It is an open research problem.

Replies

filup • today at 10:33 AM

That sounds like the way. Everyone trains their own small problems to maximally compressed weights and then merges.

alt Hacker News

Replies