logoalt Hacker News

bigyabaiyesterday at 5:18 PM1 replyview on HN

We have to get real, here - most people are not replacing GPT or Claude with local inference, even on M5. If you can afford to do that (RAM shortage or not), then you are in the minority of customers.

Alleviating the memory constraint would only really make Nvidia a danger to cloud margins, and their consumer sales are neutered while they focus on the datacenter segment. It's feels facetious to insinuate that people would be doing inference on their Macbook Neo or Wintel laptop if they only had a gorbillion gigabytes of memory and a 400W accelerator card plugged into the wall outlet.


Replies

kamranjonyesterday at 5:31 PM

You’re out of the loop if you don’t think m series chips with unified memory aren’t one of the best platforms for running local inference

show 1 reply