logoalt Hacker News

Gigachadtoday at 5:01 AM2 repliesview on HN

We still aren't going to be putting 200gb ram on a phone in a couple years to run those local models.


Replies

mh-today at 5:17 AM

A lot of people are making the mistake of noticing that local models have been 12-24 months behind SotA ones for a good portion of the last couple years, and then drawing a dotted line assuming that continues to hold.

It simply.. doesn't. The SotA models are enormous now, and there's no free lunch on compression/quantization here.

Opus 4.6 capabilities are not coming to your (even 64-128gb) laptop or phone in the popular architecture that current LLMs use.

Now, that doesn't mean that a much narrower-scoped model with very impressive results can't be delivered. But that narrower model won't have the same breadth of knowledge, and TBD if it's possible to get the quality/outcomes seen with these models without that broad "world" knowledge.

It also doesn't preclude a new architecture or other breakthrough. I'm simply stating it doesn't happen with the current way of building these.

edit: forgot to mention the notion of ASIC-style models on a chip. I haven't been following this closely, but last I saw the power requirements are too steep for a mobile device.

show 5 replies
jurmoustoday at 6:07 AM

We don’t need 200gb of RAM on a phone to run big models. Just 200 GB of storage thanks to Apple’s “LLM in a flash” research.

See: https://x.com/danveloper/status/2034353876753592372

show 1 reply