That is exactly why Apple's on-device strategy is the only economically viable one. If every Siri request cost $0.01 for cloud inference, Apple would go bankrupt in a month. But if inference happens on the Neural Engine on the user's phone, the cost to Apple is zero (well, aside from R&D). This solves the problem of unmonetizable requests like "set a timer," which killed Alexa's economics
On top of it, on-device models increase response times and can be really private if the developer decides.
The greed to lock customers in early on for cheap or free, in hopes to force them on a subscription, absolutely ruined the previous era os assistants. It could have been great with offline inference and foster competition. Instead we got mediocre assistants, thst got worse each year.