qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac stu...

efficax • yesterday at 7:36 PM • 2 replies • view on HN

qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.

Replies

smcleod • yesterday at 9:21 PM

Apple Silicon before the M4 does not have matmul instructions which causes the prompt processing to be very slow. It's quite different on the M5, much like using a nvidia GPU

2ndorderthought • yesterday at 7:56 PM

Yea you probably do want to use a GPU for models of that size.

I also wonder what quantization you are using? If you haven't tried other quants I really would

➕ show 1 reply

alt Hacker News

Replies