This paper “were RNNs all we needed?” explores this hypothesis a bit, finding that some pre-transfor...

rsfern • last Tuesday at 3:50 AM • 0 replies • view on HN

This paper “were RNNs all we needed?” explores this hypothesis a bit, finding that some pre-transformer sequence models can match transformers when trained at appropriate scale. Though they did have to make some modifications to unlock more parallelism

https://arxiv.org/abs/2410.01201

alt Hacker News