> anything you pick up second-hand will still deprecate at that pace Not really? The people who...

bigyabai • last Friday at 7:43 PM • 2 replies • view on HN

> anything you pick up second-hand will still deprecate at that pace

Not really? The people who do local inference most (from what I've seen) are owners of Apple Silicon and Nvidia hardware. Apple Silicon has ~7 years of decent enough LLM support under it's belt, and Nvidia is only now starting to depreciate 11-year-old GPU hardware in drivers.

If you bought a decently powerful inference machine 3 or 5 years ago, it's probably still plugging away with great tok/s. Maybe even faster inference because of MoE architectures or improvements in the backend.

Replies

Uehreka • last Friday at 8:30 PM

People on HN do a lot of wishful thinking when it comes to the macOS LLM situation. I feel like most of the people touting the Mac’s ability to run LLMs are either impressed that they run at all, are doing fairly simple tasks, or just have a toy model they like to mess around with and it doesn’t matter if it messes up.

And that’s fine! But then people come into the conversation from Claude Code and think there’s a way to run a coding assistant on Mac, saying “sure it won’t be as good as Claude Sonnet, but if it’s even half as good that’ll be fine!”

And then they realize that the heavvvvily quantized models that you can run on a mac (that isn’t a $6000 beast) can’t invoke tools properly, and try to “bridge the gap” by hallucinating tool outputs, and it becomes clear that the models that are small enough to run locally aren’t “20-50% as good as Claude Sonnet”, they’re like toddlers by comparison.

People need to be more clear about what they mean when they say they’re running models locally. If you want to build an image-captioner, fine, go ahead, grab Gemma 7b or something. If you want an assistant you can talk to that will give you advice or help you with arbitrary tasks for work, that’s not something that’s on the menu.

➕ show 2 replies

Aurornis • last Friday at 8:13 PM

> If you bought a decently powerful inference machine 3 or 5 years ago, it's probably still plugging away with great tok/s.

I think this is the difference between people who embrace hobby LLMs and people who don’t:

The token/s output speed on affordable local hardware for large models is not great for me. I already wish the cloud hosted solutions were several times faster. Any time I go to a local model it feels like I’m writing e-mails back and forth to an LLM, not working with it.

And also, the first Apple M1 chip was released less than 5 years ago, not 7.

➕ show 1 reply

alt Hacker News

Replies