Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the ...

bhadass • today at 3:04 AM • 1 reply • view on HN

Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...

Replies

nextos • today at 3:35 AM

Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.

alt Hacker News

Replies