If model arch doesn't matter much how come transformers changed everything?

0x3f • today at 5:26 PM • 1 reply • view on HN

Replies

Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another.

➕ show 1 reply

alt Hacker News

Replies