Maybe I’m missing it but the page is really light on technical information. Is this a quantized / distilled model of a larger LLM? Which one? How many parameters? What quantization? What T/s can I expect? What are the VRAM requirements? Etc etc
I tried it on my iPhone 13 mini. I believe the model you get changes depending on your phone specs. For me it downloaded a ~1.3GB model which can speak in complete sentences but can’t do much beyond that. Can’t blame them though—that model is tiny, and my device wasn’t designed for this.
I have the same questions. After installing the app, it downloads 2.5 GB of data. I presume this is the model.
You can see what it uses here - https://github.com/ente-io/ente/blob/main/web/apps/ensu/src/...
Either LFM2.5-1.6B-4bit or Qwen3.5-2B-8bit or Qwen3.5-4B-4bit