Prices are still coming down. Assuming that keeps happening we will have laptops with enough RAM in the sub-2k range in 5 years.
Question is whether models will keep getting bigger. If useful model sizes plateau eventually a good model becomes something at least many people can easily run locally. If models keep usefully growing this doesn’t happen.
The largest ones I see are in the 405g range which quantized fits in 256g RAM.
Long term I expect custom hardware accelerators designed specifically for LLMs to show up, basically an ASIC. If those got affordable I could see little USB-C accelerator boxes being under $1k able to run huge LLMs fast and with less power.
GPUs are most efficient for batch inference which lends itself to hosting not local use. What I mean is a lighter chip made to run small or single batch inference very fast using less power. The bottleneck there is memory bandwidth so I suspect fast RAM would be most of the cost of such a device. Small or single batch inference is memory bandwidth bound.
GPUs are already effectively ASICs for the math that runs both 3D scenes and LLMs, no?