They did do that, 2 years ago. The problems are that 1) mamba makes accuracy worse as context size g...

0xbadcafebee • today at 5:03 PM • 0 replies • view on HN

They did do that, 2 years ago. The problems are that 1) mamba makes accuracy worse as context size grows, 2) Nvidia GPUs are designed for transformers, and 3) all the software out there is also designed for transformers. It's still useful in some applications but it doesn't beat regular transformers if you have the gear

alt Hacker News