logoalt Hacker News

netghostlast Tuesday at 10:34 PM1 replyview on HN

Kind of depends on your mac, but if it's a relatively recent apple silicon model… maybe, probably?

> Nemotron 3 Nano is a 3.2B active (3.6B with embeddings) 31.6B total parameter model.

So I don't know the exact math once you have a MoE, but 3.2b will run on most anything, 31.6b and you're looking at needing a pretty large amount of ram.


Replies

vesseneslast Tuesday at 11:13 PM

Given Mac bandwidth, you'll generally want to load the whole thing in RAM. You get speed benefits based on smaller-size active experts, since the Mac compute is slow compared to Nvidia hardware. This should be relatively snappy on a Mac, if you can load the entire thing.