logoalt Hacker News

veunestoday at 7:02 AM0 repliesview on HN

That's true for current LLMs, but Apple is playing the long game. First, they are masters of quantization optimization (their 3-4 bit models perform surprisingly well). Second, Unified Memory is a cheat code. Even 8GB on M1/M2 allows for things impossible on a discrete GPU with 8GB VRAM due to data transfer overhead. And for serious tasks, there's the Mac Studio with 192GB RAM, which is actually the cheapest way to run Llama-400B locally