logoalt Hacker News

heipeitoday at 4:14 PM2 repliesview on HN

Depends on what you mean by "local". On your Macbook, large dense models like Qwen 3.6 27B will be slow, sure. On a local workstation with a dedicated RTX card you can get > 100 tps, which is more than good enough to work with it, and faster than cloud models in many cases.


Replies

jstanleytoday at 4:15 PM

But how smart is it? All the people running local models never seem to mention that they are way dumber than cloud models.

I don't care how many tokens per second of nonsense it can generate.

show 5 replies
c0rruptbytestoday at 5:20 PM

I'm talking about the common use case that I think hacker news people have:

you get a macbook for work, you run the macbook

they're not going to start giving GPUs to employees to run local models