logoalt Hacker News

Hamukoyesterday at 2:27 PM5 repliesview on HN

I've tried to use a local LLM on an M4 Pro machine and it's quite painful. Not surprised that people into LLMs would pay for tokens instead of trying to force their poor MacBooks to do it.


Replies

atwrkyesterday at 2:40 PM

Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.

usagisushiyesterday at 5:05 PM

Qwen 3.5 35B-A3B and 27B have changed the game for me. I expect we'll see something comparable to Sonnet 4.6 running locally sometime this year.

show 2 replies
freeone3000yesterday at 2:53 PM

I’m super happy with it for embedding, image recog, and semantic video segmentation tasks.

giancarlostoroyesterday at 2:36 PM

What are the other specs and how's your setup look? You need a minimum of 24GB of RAM for it to run 16GB or less models.

show 3 replies
andoandoyesterday at 3:58 PM

Local LLMs are useful for stuff like tool calling

show 1 reply