logoalt Hacker News

apilast Saturday at 2:19 PM1 replyview on HN

Prices are still coming down. Assuming that keeps happening we will have laptops with enough RAM in the sub-2k range in 5 years.

Question is whether models will keep getting bigger. If useful model sizes plateau eventually a good model becomes something at least many people can easily run locally. If models keep usefully growing this doesn’t happen.

The largest ones I see are in the 405g range which quantized fits in 256g RAM.

Long term I expect custom hardware accelerators designed specifically for LLMs to show up, basically an ASIC. If those got affordable I could see little USB-C accelerator boxes being under $1k able to run huge LLMs fast and with less power.

GPUs are most efficient for batch inference which lends itself to hosting not local use. What I mean is a lighter chip made to run small or single batch inference very fast using less power. The bottleneck there is memory bandwidth so I suspect fast RAM would be most of the cost of such a device. Small or single batch inference is memory bandwidth bound.


Replies

m-s-ylast Saturday at 4:18 PM

GPUs are already effectively ASICs for the math that runs both 3D scenes and LLMs, no?