While I'm happy with Apple introducing this abstraction. my main concern was with local models.
I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.
I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).
I didn't see anything suggesting that's the case.
That's a great opportunity for Apple to provide a universal unique model ID protocol and some shared storage space to allow devs to register models.
Check out “Bring an LLM provider to the Foundation Models framework” - https://developer.apple.com/videos/play/wwdc2026/339
That is exactly what foundation models are, yes. Same in Android with AICore which uses Gemma underneath, apps can query the LLM and receive responses back rather than bundling in their own model.
The apps can use the system provided on-device model using the same framework and APIs; but there's no affordances to deduplicate custom models between apps.
Ok but don't expect Anthropic to help with local models, that'll be something apple rolls out themselves if at all
Sounds ripe for block-level deduplication. :D Or an API that lets you request a model and handles caching.
I think that's what they are trying to avoid. If you need on-device intelligence, their pitch was "The model the device already has is best", and if you need something more specific an adapter (aka, a fine-tune/lora) is best.
They were wrong when their on-device model was way behind. They still might be right in the long term.
While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.
Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.
- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.
- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.
- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.
- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)
You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.