logoalt Hacker News

zozbot234today at 12:03 PM0 repliesview on HN

That's very impressive but it's streaming in weights from flash storage. That's not really viable in a mobile context, it will use way too much power. Smaller models are way more applicable to typical use, perhaps with mid-sized models (like the Gemma4 26A4B model) using weights offload from SSD for rare uses involving slower "pro" inference.