logoalt Hacker News

ashwinnair99today at 2:57 PM6 repliesview on HN

A year ago this would have been considered impossible. The hardware is moving faster than anyone's software assumptions.


Replies

cogman10today at 3:01 PM

This isn't a hardware feat, this is a software triumph.

They didn't make special purpose hardware to run a model. They crafted a large model so that it could run on consumer hardware (a phone).

show 4 replies
mannyvtoday at 3:46 PM

The software has real software engineers working on it instead of researchers.

Remember when people were arguing about whether to use mmap? What a ridiculous argument.

At some point someone will figure out how to tile the weights and the memory requirements will drop again.

show 1 reply
Aurornistoday at 4:10 PM

It wasn't considered impossible. There are examples of large MoE LLMs running on small hardware all over the internet, like giant models on Raspberry Pi 5.

It's just so slow that nobody pursued it seriously. It's fun to see these tricks implemented, but even on this 2025 top spec iPhone Pro the output is 100X slower than output from hosted services.

show 1 reply
t00today at 6:19 PM

/FIFY A year ago this would have been considered impossible. The software is moving faster than anyone's hardware assumptions.

ottahtoday at 4:42 PM

I mean, by any reasonable standard it still is. Almost any computer can run an llm, it's just a matter of how fast, and 0.4k/s (peak before first token) is not really considered running. It's a demo, but practically speaking entirely useless.

show 1 reply
iberatortoday at 5:50 PM

Does iPhone have some kind of hardware acceleration for neural netwoeks/ai ?

show 1 reply