Depends on what you mean by "local". On your Macbook, large dense models like Qwen 3.6 27B...

heipei • today at 4:14 PM • 2 replies • view on HN

Depends on what you mean by "local". On your Macbook, large dense models like Qwen 3.6 27B will be slow, sure. On a local workstation with a dedicated RTX card you can get > 100 tps, which is more than good enough to work with it, and faster than cloud models in many cases.

Replies

jstanley • today at 4:15 PM

But how smart is it? All the people running local models never seem to mention that they are way dumber than cloud models.

I don't care how many tokens per second of nonsense it can generate.

➕ show 5 replies

c0rruptbytes • today at 5:20 PM

I'm talking about the common use case that I think hacker news people have:

you get a macbook for work, you run the macbook

they're not going to start giving GPUs to employees to run local models

alt Hacker News

Replies