logoalt Hacker News

crystal_revengetoday at 6:34 AM1 replyview on HN

> I hate being told what technology I can and can't use

Ever since the original GPT-2 "it's too powerful to release!" I've realized that whatever is the current state of open models represents what we really have access to.

It's shocking to me how many people on HN, who engage in long conversations about LLMs and AI, have never actually run a model on their own hardware.

All you need is a reasonably good macbook pro/studio or an RTX [3-5]090 and you can run useful models in the >= 30 tokens/second range (much higher if you choose the GPU path). The difference between what you can run on this hardware and what you can run on hardware that costs 2-5x is not that big. Don't be fooled by people on Twitter/X claiming you need some outrageous setup.

It's also increasingly clear that frontier models are nowhere near close to pushing the limits of efficiency. Quantization, MoE, and other techniques have dramatically improved even in the last year.

For work, of course use OpenAI/Anthropic models, but for anything personal, anyone who considers themselves a "real engineer" should be running local models, using open harnesses and seeing what they can accomplish with these.

Even if open releases slow down or even stop, we have the foundation, right now, for smart engineers to squeeze something quite useful out of. Hopefully we'll one day figure out how to train large models in a federated way. But either way: not your weights, not your inference.


Replies

fullstackchristoday at 6:47 AM

lets be genuine here: those local models are no where near the capabilities of true modern llms like codex 5.5 and fable 5

but i also dont doubt in a few years time models with those benchmarks will be able to be run locally

still many many breakthroughs to be had

show 1 reply