That's true for current LLMs, but Apple is playing the long game. First, they are masters of q...

veunes • today at 7:02 AM • 0 replies • view on HN

That's true for current LLMs, but Apple is playing the long game. First, they are masters of quantization optimization (their 3-4 bit models perform surprisingly well). Second, Unified Memory is a cheat code. Even 8GB on M1/M2 allows for things impossible on a discrete GPU with 8GB VRAM due to data transfer overhead. And for serious tasks, there's the Mac Studio with 192GB RAM, which is actually the cheapest way to run Llama-400B locally

alt Hacker News