logoalt Hacker News

bboryesterday at 10:57 PM1 replyview on HN

I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.

Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).

FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].

[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

[2] https://huggingface.co/google/gemma-3n-E4B-it

[3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...


Replies

qeternityyesterday at 11:02 PM

> but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1]

No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.