It's crazy to see a 400B model running on an iPhone. But moving forward, as the information density and architectural efficiency of smaller models continue to increase, getting high-quality, real-time inference on mobile is going to become trivial.
> moving forward, as the information density and architectural efficiency of smaller models continue to increase
If they continue to increase.
Probably 2x speed for Mac Studio this year if they do double NAND ( or quad?)