Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).
But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.
This is a good starting point: https://huggingface.co/docs/peft/developer_guides/model_merg...
But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.