A year ago this would have been considered impossible. The hardware is moving faster than anyone...

ashwinnair99 • today at 2:57 PM • 6 replies • view on HN

A year ago this would have been considered impossible. The hardware is moving faster than anyone's software assumptions.

Replies

cogman10 • today at 3:01 PM

This isn't a hardware feat, this is a software triumph.

They didn't make special purpose hardware to run a model. They crafted a large model so that it could run on consumer hardware (a phone).

➕ show 4 replies

mannyv • today at 3:46 PM

The software has real software engineers working on it instead of researchers.

Remember when people were arguing about whether to use mmap? What a ridiculous argument.

At some point someone will figure out how to tile the weights and the memory requirements will drop again.

➕ show 1 reply

Aurornis • today at 4:10 PM

It wasn't considered impossible. There are examples of large MoE LLMs running on small hardware all over the internet, like giant models on Raspberry Pi 5.

It's just so slow that nobody pursued it seriously. It's fun to see these tricks implemented, but even on this 2025 top spec iPhone Pro the output is 100X slower than output from hosted services.

➕ show 1 reply

t00 • today at 6:19 PM

/FIFY A year ago this would have been considered impossible. The software is moving faster than anyone's hardware assumptions.

ottah • today at 4:42 PM

I mean, by any reasonable standard it still is. Almost any computer can run an llm, it's just a matter of how fast, and 0.4k/s (peak before first token) is not really considered running. It's a demo, but practically speaking entirely useless.

➕ show 1 reply

iberator • today at 5:50 PM

Does iPhone have some kind of hardware acceleration for neural netwoeks/ai ?

➕ show 1 reply

alt Hacker News

Replies