logoalt Hacker News

efficaxyesterday at 7:36 PM2 repliesview on HN

qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.


Replies

smcleodyesterday at 9:21 PM

Apple Silicon before the M4 does not have matmul instructions which causes the prompt processing to be very slow. It's quite different on the M5, much like using a nvidia GPU

2ndorderthoughtyesterday at 7:56 PM

Yea you probably do want to use a GPU for models of that size.

I also wonder what quantization you are using? If you haven't tried other quants I really would

show 1 reply