logoalt Hacker News

Myrmornistoday at 4:47 AM2 repliesview on HN

Can anyone give any tips for getting something that runs fairly fast under ollama? It doesn't have to be very intelligent.

When I tried gpt-oss and qwen using ollama on an M2 Mac the main problem was that they were extremely slow. But I did have a need for a free local model.


Replies

parthsareentoday at 5:14 AM

How much ram are you running with? Qwen3 and gpt-oss:20b punch a good bit above their weight. Personally use it for small agents.

am17antoday at 4:56 AM

Use llama.cpp? I get 250 toks/sec on gpt-oss using a 4090, not sure about the mac speeds