I wish I could upvote this twice. We (devs) really REALLY need to consider on-device compute before going to the cloud for LLM inference.