logoalt Hacker News

0x3ftoday at 5:26 PM1 replyview on HN

If model arch doesn't matter much how come transformers changed everything?


Replies

visargatoday at 5:32 PM

Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another.

show 1 reply