logoalt Hacker News

syntaxinglast Wednesday at 2:40 AM2 repliesview on HN

I feel like calling it a “30B” model is slightly disingenuous. It’s a 30B-A3B. So only 3B parameters is active at a given time. While still impressive nevertheless, being able to get 8T/s for a “A3B” compared to a dense 30B is very different.


Replies

CamperBob2last Wednesday at 4:45 AM

Out of curiosity, I just tried Qwen3-30B-A3B-Instruct-2507-Q3_K_S-2.70bpw.gguf (the version they recommend for the Raspberry Pi) on a Blackwell GPU. It cranked out 200+ tokens per second on some private benchmark queries, and it is surprisingly sharp.

It punches well above the weight class expected from 3B active parameters. You could build the bear in Spielberg's "AI" with this thing, if not the kid.

show 1 reply
throwaway894345last Wednesday at 3:55 AM

What does it mean that only 3B parameters are active at a time? Also any indication of whether this was purely CPU or if it’s using the Pi’s GPU?

show 2 replies