People on HN do a lot of wishful thinking when it comes to the macOS LLM situation. I feel like most...

Uehreka • 08/08/2025 • 2 replies • view on HN

People on HN do a lot of wishful thinking when it comes to the macOS LLM situation. I feel like most of the people touting the Mac’s ability to run LLMs are either impressed that they run at all, are doing fairly simple tasks, or just have a toy model they like to mess around with and it doesn’t matter if it messes up.

And that’s fine! But then people come into the conversation from Claude Code and think there’s a way to run a coding assistant on Mac, saying “sure it won’t be as good as Claude Sonnet, but if it’s even half as good that’ll be fine!”

And then they realize that the heavvvvily quantized models that you can run on a mac (that isn’t a $6000 beast) can’t invoke tools properly, and try to “bridge the gap” by hallucinating tool outputs, and it becomes clear that the models that are small enough to run locally aren’t “20-50% as good as Claude Sonnet”, they’re like toddlers by comparison.

People need to be more clear about what they mean when they say they’re running models locally. If you want to build an image-captioner, fine, go ahead, grab Gemma 7b or something. If you want an assistant you can talk to that will give you advice or help you with arbitrary tasks for work, that’s not something that’s on the menu.

Replies

EagnaIonat • 08/09/2025

> I feel like most of the people touting the Mac’s ability to run LLMs are either impressed that they run at all, are doing fairly simple tasks, or just have a toy model they like to mess around with and it doesn’t matter if it messes up.

I feel like you haven't actually used it. Your comment may have been true 5 years ago.

> If you want an assistant you can talk to that will give you advice or help you with arbitrary tasks for work, that’s not something that’s on the menu.

You can use a RAG approach (eg. Milvus) and also LoRA templates to dramatically improve the accuracy of the answer if needed.

Locally you can run multiple models, multiple times without having to worry about costs.

You also have the likes of Open WebUI which builds numerous features on top of an interface if you don't want to do coding.

I have a very old M1 MBP 32GB and I have numerous applications built to do custom work. It does the job the fine and speed is not an issue. Not good enough to do a LoRA build but I have a more recent laptop for that.

I doubt I am the only one.

bigyabai • 08/08/2025

I agree completely. My larger point is that Apple and Nvidia's hardware has depreciated less slowly, because they've been shipping highly dense chips for a while now. Apple's software situation is utterly derelict and it cannot be seriously compared to CUDA in the same sentence.

For inference purposes, though, compute shaders have worked fine for all 3 manufacturers. It's really only Nvidia users that benefit from the wealth of finetuning/training programs that are typically CUDA-native.

alt Hacker News

Replies