logoalt Hacker News

stingraycharleslast Tuesday at 3:05 AM0 repliesview on HN

Well intuitively it makes sense that within each independent model, a small number of weights / parameters are very dominant, but it’s still super interesting that these can be swapped between all the models without loss of performance.

It isn’t obvious that these parameters are universal across all models.