A future where we carry and manage just one device could be incredible. That said, today, even if iOS weren’t so locked down and more capable of that, I think I’d find myself frustrated. I run on device local llm’s on my iPhone and a heavily quantized 3b parameter model starts to cause the iPhones thermal management to heavily throttle after just a few prompts with light tokens, to the point it’s slower than 1 token per second for inference or response, and the phone gets hot to the touch. Maybe the rumored half iPhone half iPad device could be the eventual platform from which something like this emerges.
A future where we carry and manage just one device could be incredible. That said, today, even if iOS weren’t so locked down and more capable of that, I think I’d find myself frustrated. I run on device local llm’s on my iPhone and a heavily quantized 3b parameter model starts to cause the iPhones thermal management to heavily throttle after just a few prompts with light tokens, to the point it’s slower than 1 token per second for inference or response, and the phone gets hot to the touch. Maybe the rumored half iPhone half iPad device could be the eventual platform from which something like this emerges.