> Any time I go to a local model it feels like I’m writing e-mails back and forth
Do you have a good accelerator? If you're offloading to a powerful GPU it shouldn't feel like that at all. I've gotten ChatGPT speeds from a 4060 running the OSS 20B and Qwen3 30B models, both of which are competitive with OpenAI's last-gen models.
> the first Apple M1 chip was released less than 5 years ago
Core ML has been running on Apple-designed silicon for 8 years now, if we really want to get pedantic. But sure, actual LLM/transformer use is a more recent phenomenon.