This is fascinating that it worked though. Can we just merge all the open weight models and get some...

AnotherGoodName • today at 4:32 PM • 5 replies • view on HN

This is fascinating that it worked though. Can we just merge all the open weight models and get something better?

Replies

nylonstrung • today at 5:58 PM

If you go to Civitai this is pretty how it works in that corner of the image generation world

Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints

wds • today at 4:43 PM

I imagine it'd work the same as merging all the good-tasting foods to get an even tastier one

avereveard • today at 5:05 PM

most merge improve a small subset of "feeling" benchmark (too small, too specific, or out of distribution) and tend to show degradation on actual benchmark, with especially punishing result on long chain benchmarks.

also only work on matching architectures (i.e. finetunes/loras of the same model)

dindunuf • today at 4:59 PM

that kinda worked in llama 1/2 era, not between different models but between finetunes of the same model. the briefly legendary Mythomax was IIRC a merge of 5+ tunes, some of which were merges themselves.

_3u10 • today at 4:38 PM

No, they need the same arch, but you can distill them into a single model. And yes, if you use the API directly Claude will often say it’s an open weight model (likely the ones it was distilled from)

alt Hacker News

Replies