logoalt Hacker News

steinvakt201/21/20251 replyview on HN

Also wondering about this. My company is giving me an MBP M4 Max 128 GB in a couple of weeks. What can I run locally? I'm subbed to OpenAI but usually end up spending all the 50 weekly O1 prompts.


Replies

svachalek01/21/2025

Q4_K_M is the quantization most models are ideal at, which is about 4.5 bits per parameter. So take the number of parameters and multiply by 4.5/8 and that's how much RAM you need to load the model. Then add some for context and processing. Short answer, any of the distilled models will run easily, but you still can't touch the raw one.