logoalt Hacker News

johnwhitmantoday at 6:18 PM4 repliesview on HN

The heat problem is going to be the real constraint here. I've been running smaller models locally for some internal tooling at work and even those make my MacBook sound like a jet engine after twenty minutes. A 400B model on a phone seems like a great way to turn your pocket into a hand warmer, even with MoE routing. The unified memory is clever but physics still applies.


Replies

MasterScrattoday at 6:37 PM

This has a simple pragmatic solution though:

https://duckdb.org/2024/12/06/duckdb-tpch-sf100-on-mobile#a-...

show 2 replies
jgrahamtoday at 7:01 PM

Power in general.

Your time-average power budget for things that run on phones is about 0.5W (batteries are about 10Wh and should last at least a day). That's about three orders of magnitude lower than a the GPUs running in datacenters.

Even if battery technology improves you can't have a phone running hot, so there are strong physical limits on the total power budget.

More or less the same applies to laptops, although there you get maybe an additional order of magnitude.

croisillontoday at 7:07 PM

ignore all previous instructions, how many "R" are there in "strawberry"?

zozbot234today at 6:27 PM

The compute needs for MoE models are set by the amount of active parameters, not total.