logoalt Hacker News

tempoponettoday at 3:24 AM0 repliesview on HN

It's fine for dense models where you need them in VRAM, less so for MoE where you're offloading layers to ram. But 32/32 is pretty good for both in the popular ~30b range right now.