logoalt Hacker News

2ndorderthoughtyesterday at 7:56 PM1 replyview on HN

Yea you probably do want to use a GPU for models of that size.

I also wonder what quantization you are using? If you haven't tried other quants I really would


Replies

efficaxyesterday at 8:00 PM

This is qwen3.6:27b-coding-nvfp4. It's only an M1. If they ever ship an M5 studio with 96GB of ram, that's my next upgrade path for the local llm experiments.

You can get work done with them if you have a harness that can drive outcomes without needing feedback (I've been building a tdd red to green agent harness lately that is very effective if given a good plan upfront). So if you can stand waiting a few days to see results that would only take hours with a model deployed to frontier nvidia hardware, you can get results this way.

show 1 reply