Only way to have hardware reach this sort of efficiency is to embed the model in hardware.
This exists[0], but the chip in question is physically large and won't fit on a phone.
I think for many reasons this will become the dominant paradigm for end user devices.
Moore's law will shrink it to 8mm soon. I think it'll be like a microSD card you plug in.
Or we develop a new silicon process that can mimic synaptic weights in biology. Synapses have plasticity.
That's actually pretty cool, but I'd hate to freeze a models weights into silicon without having an incredibly specific and broad usecase.
[dead]
I think you're ignoring the inevitable march of progress. Phones will get big enough to hold it soon.