What we want as developers: To be able to implement functionality that utilizes a model for tasks like OCR, visual input and analysis, search or re-ranking etc, without having to implement an LLM API and pay for it. Instead we'd like to offer the functionality to users, possibly at no cost, and use their edge computing capacity to achieve it, by calling local protocols and models.
What we want as users: To have advanced functionality without having to pay for a model or API and having to auth it with every app we're using. We also want to keep data on our devices.
What trainers of small models want: A way for users to get their models on their devices, and potentially pay for advanced, specialized and highly performant on-device models, instead of APIs.
What seems to be delivered by NPUs at this point: filtering background noise from our microphone and blurring our camera using a watt or two less than before.