> If you bought a decently powerful inference machine 3 or 5 years ago, it's probably still ...

Aurornis • 08/08/2025 • 1 reply • view on HN

> If you bought a decently powerful inference machine 3 or 5 years ago, it's probably still plugging away with great tok/s.

I think this is the difference between people who embrace hobby LLMs and people who don’t:

The token/s output speed on affordable local hardware for large models is not great for me. I already wish the cloud hosted solutions were several times faster. Any time I go to a local model it feels like I’m writing e-mails back and forth to an LLM, not working with it.

And also, the first Apple M1 chip was released less than 5 years ago, not 7.

Replies

bigyabai • 08/08/2025

> Any time I go to a local model it feels like I’m writing e-mails back and forth

Do you have a good accelerator? If you're offloading to a powerful GPU it shouldn't feel like that at all. I've gotten ChatGPT speeds from a 4060 running the OSS 20B and Qwen3 30B models, both of which are competitive with OpenAI's last-gen models.

> the first Apple M1 chip was released less than 5 years ago

Core ML has been running on Apple-designed silicon for 8 years now, if we really want to get pedantic. But sure, actual LLM/transformer use is a more recent phenomenon.

alt Hacker News

Replies