Depends on what you mean by "local". On your Macbook, large dense models like Qwen 3.6 27B will be slow, sure. On a local workstation with a dedicated RTX card you can get > 100 tps, which is more than good enough to work with it, and faster than cloud models in many cases.
I'm talking about the common use case that I think hacker news people have:
you get a macbook for work, you run the macbook
they're not going to start giving GPUs to employees to run local models
But how smart is it? All the people running local models never seem to mention that they are way dumber than cloud models.
I don't care how many tokens per second of nonsense it can generate.