logoalt Hacker News

ActorNightlyyesterday at 9:57 PM3 repliesview on HN

>winning on cost-effectiveness

Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks.


Replies

JSR_FDEDyesterday at 10:56 PM

Nobody is winning until cars are the size of a pack of cards. Which is big enough to transport even the largest cargo.

show 1 reply
beefnugsyesterday at 10:24 PM

Why does that matter? They wont be making at home graphics cards anymore. Why would you do that when you can be pre-sold $40k servers for years into the future

show 2 replies
bboryesterday at 10:57 PM

I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.

Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).

FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].

[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

[2] https://huggingface.co/google/gemma-3n-E4B-it

[3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

show 1 reply