Mamba-3

231 points • by matt_d • last Tuesday at 10:45 PM • 44 comments • view on HN

Comments

I'm looking forward to comparing this to Inception 2 (the text diffusion model) which in my experience is very fast and reasonably high quality.

➕ show 3 replies

Havoc • today at 1:27 PM

Is there a reason we don’t switch halfway through? ie start with a classic LLM and switch to something linear like mamba as context grows

➕ show 5 replies

jeffhwang • today at 3:56 PM

I'm glad I clicked through bc I thought the article was about Mamba, the package manager I associate with Python (similar to conda).

https://github.com/mamba-org/mamba

jychang • today at 9:44 AM

I'm not sure that I buy their conclusion that more compute during inference is good.

Yes, batch=1 inference is mostly memory bandwidth bound, not GPU compute bound. But no provider does batch=1 inference. Everyone groups all the requests into a batch, and the GPU computes them together.

With a fused kernel, that means the GPU streams the tensors from VRAM, and does a bunch of compute on different conversations in the batch, at the same time.

If they increase the amount of compute required per token, that just reduces the maximum batch size a GPU can handle. In practice, yes this does mean each GPU can serve less users. Providers aren't leaving GPU cores idle normally during inference.

➕ show 3 replies

anentropic • today at 3:27 PM

More here https://news.ycombinator.com/item?id=47423208

https://arxiv.org/abs/2603.15569

diablevv • today at 2:21 PM

[dead]

daliliu • today at 1:02 PM

[dead]

robofanatic • today at 6:09 AM

> Mamba-3 is a new state space model (SSM) designed with inference efficiency as the primary goal — a departure from Mamba-2, which optimized for training speed. The key upgrades are a more expressive recurrence formula, complex-valued state tracking, and a MIMO (multi-input, multi-output) variant that boosts accuracy without slowing down decoding.

Why can’t they simply say -

Mamba-3 focuses on being faster and more efficient when making predictions, rather than just being fast to train like Mamba-2.

➕ show 6 replies

alt Hacker News

Mamba-3

Comments